autopager
autopager copied to clipboard
Detect and classify pagination links
This is because of https://github.com/TeamHG-Memex/sklearn-crfsuite/issues/68
How can I use this to detect links in attributes such as data-src, meta, etc. ? It only works for anchor tags.
Here is the traceback ``` File "/usr/local/lib/python3.6/dist-packages/autopager/autopager.py", line 51, in extract return list(get_shared_autopager().extract(page, direct, prev, next)) File "/usr/local/lib/python3.6/dist-packages/autopager/autopager.py", line 112, in extract xseq = page_to_features(links) File "/usr/local/lib/python3.6/dist-packages/autopager/model.py", line 129, in...
Websites often provide links "show 20/50/100 results per page"; by following them crawler can get the same contents multiple times. It'd be nice to detect these links.
Currently autopager classifies each `` element as a part of paginator or not. Because there can be several paginators on a web page it'd be nice to group `` links...