extraction issues

Use lxml if available and parse head only.

Though the `README.md` hinted that `lxml` will be used if available, the choice of parsers was forced to be only `html5lib` in the code. Also, have added checks to parse...

musically-ut

Add technique for extracting outgoing links from the content

Currently only canonical urls are extracted. It would be fairly easy to include a technique to also include outgoing links, and possibly also relative links and images. Maybe these shouldnt...

jayvdb

Fixed BeautifulSoup warning

Fixed this warning from BeautifulSoup ``` UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but...

fabiancabau

Support for python 3

I don't know if this is on the roadmap for this project, but it would be nice! Even without rewriting everything, it might be possible to easily support python 3...

cassidylaidlaw