tosback2 icon indicating copy to clipboard operation
tosback2 copied to clipboard

Reimplementing TOSBack using git as a database layer!

This is TOSBack version 2, a clean redesign & reimplementation of EFF's TOSBack project.

It uses Git as an inherently and efficiently versioned backend storage database.

After cloning the git repository, you need to execute this command:

git submodule update --init --recursive

That will fetch a recent version of the GitPython code, which we depend upon.

BUGS IN WGET

If you want to actually run the crawler yourself (not really necessary unless you're testing something), be aware that TOSBack2 also exposes a number of bugs in common versions of wget. As of December 2011, there are two bugs you might need to patch yourself!

(FOR YOUR CONVENIENCE, a patched version of the wget source can be found in lib/wget-1.13.4/ . There is also a binary .deb that Debian and Ubuntu users can try in lib/. More hints on building from source below)

  1. Versions of wget built against gnutls may suffer from fatal memory leaks https://lists.gnu.org/archive/html/bug-wget/2011-10/msg00050.html (so apply that patch, or build against openssl using ./configure --with-ssl=openssl).

  2. You should also apply the following patch https://savannah.gnu.org/support/download.php?file_id=24473 to fix this bug: https://savannah.gnu.org/bugs/?21714

HINTS FOR BUILDING WGET FROM SOURCE ON DEBIAN OR UBUNTU

sudo apt-get build-dep wget cd lib/wget-1.13.4/ fakeroot debian/rules binary an installable .deb file should be written to the lib/ directory