W Data Hoarder's Handbook


Data Hoarder’s Handbook

Tools

Networking

curl

Curl is widely used tool for making network requests.

Use cases:

Debug request:
$ curl -v -s http://www.example.com > /dev/null
* Rebuilt URL to: http://www.example.com/
*   Trying 93.184.216.34...
* TCP_NODELAY set
* Connected to www.example.com (93.184.216.34) port 80 (#0)
> GET / HTTP/1.1
> Host: www.example.com
> User-Agent: curl/7.58.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Accept-Ranges: bytes
< Cache-Control: max-age=604800
< Content-Type: text/html; charset=UTF-8
< Date: Sat, 13 Apr 2019 03:36:14 GMT
< Etag: "1541025663+gzip"
< Expires: Sat, 20 Apr 2019 03:36:14 GMT
< Last-Modified: Fri, 09 Aug 2013 23:54:35 GMT
< Server: ECS (dcb/7EC9)
< Vary: Accept-Encoding
< X-Cache: HIT
< Content-Length: 1270
<
{ [1270 bytes data]
* Connection #0 to host www.example.com left intact
Check if URL exists
$ curl -I http://www.example.com
HTTP/1.1 200 OK
Content-Encoding: gzip
Accept-Ranges: bytes
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Sat, 13 Apr 2019 03:36:55 GMT
Etag: "1541025663"
Expires: Sat, 20 Apr 2019 03:36:55 GMT
Last-Modified: Fri, 09 Aug 2013 23:54:35 GMT
Server: ECS (dcb/7F3B)
X-Cache: HIT
Content-Length: 606

wget

httrack

rsync

opendirindexer

opendirindexer is a command line tool for indexing open directories. The tool crawls the server and returns list of URLs.

Example usage:

$ opendirindexer http://localhost:8000
http://localhost:8000/4.txt
http://localhost:8000/test1/1.txt
http://localhost:8000/test2/3.txt
http://localhost:8000/test1/test1.1/1.txt

OpenDirectoryDownloader

OpenDirectoryDownloader is the tool used to give statistics in the opendirectories subreddit. The tool was officially open sourced here and can be found on GitHub. Supports indexing of HTTP/FTP and GDrive resources.

lftp

ipfs

Text processing

urlcat

urlcat is a command line utility, allowing you to extract parts of URLs.

grep

wc

uniq

sort

sed

Other

img2ascii

img2ascii is tool that allows you to render images in the terminal. Can be combined with inotify to stream image downloads in a terminal screen like that:

inotifywait -r -m /download_dir | while read a b file; do [[ $b == *CREATE* ]] && echo $a$file && sleep 1 && img2ascii -converter=24bit2x -width=80 "$a$file"; done

Online tools

Fusker

From Wikipedia:

Fusker is a type of website or utility that extracts images from a web page, typically from free hosted galleries.

Communities

Hardware

HDD

NAS

Tape