comcrawl icon indicating copy to clipboard operation
comcrawl copied to clipboard

Update download url template

Open rokasramas opened this issue 3 years ago • 3 comments

Recently Common Crawl base URL was changed to "https://data.commoncrawl.org/" (see blog post), which caused issues when downloading pages (fixes #40).

Also, it seems that some time ago 'charset' attribute was renamed to 'encoding'.

rokasramas avatar Apr 16 '22 19:04 rokasramas

The patch works great. +1 from me. Thanks!

georgegach avatar Jun 11 '22 12:06 georgegach

Recently Common Crawl base URL was changed to "https://data.commoncrawl.org/" (see blog post), which caused issues when downloading pages (fixes #40).

Also, it seems that some time ago 'charset' attribute was renamed to 'encoding'.

How do I access this fix? I recently installed the package and the error is still occurring.

GMalueg avatar Jun 27 '22 00:06 GMalueg

How do I access this fix? I recently installed the package and the error is still occurring.

@GMalueg, you can install from my fork using pip install git+https://github.com/rokasramas/comcrawl.git#egg=comcrawl

rokasramas avatar Jul 05 '22 09:07 rokasramas