benoit74

Results 1993 comments of benoit74

WDYT of simply adding a `--customCss` CLI argument which can take a CSV of URLs which are expected to contain CSS and will be downloaded, added to the ZIM and...

Upstream issue: https://github.com/webrecorder/browsertrix-crawler/issues/776

Reopening, ZIM is still not published. Last run (https://farm.openzim.org/pipeline/fda86c72-527b-480c-963d-5160336068c5) was quite successfully processing the website but I had to stop it, it was at 29% (338277 / 1150670) - yes,...

All these pages do really exist. They might be a bit "virtual" in the sense that they are URLs to pages which are dynamically rendered on the real server, or...

Analyzing this website, it looks like it is a giant one. For instance we even have whole books of 10s of thousands of pages, e.g. https://ganjoor.net/t6e?p=85541 Unfortunately I do not...

Seen on https://farm.openzim.org/pipeline/8b75b35d-65e6-41db-a23d-df89e841d255/debug, but also https://farm.openzim.org/pipeline/f464bfb0-d82d-4485-9222-786f369d62e8/debug, https://farm.openzim.org/pipeline/cfa81b8a-520d-4dd2-84ee-03811685922f/debug

I've opened https://phabricator.wikimedia.org/T409450 on Wikimedia side

I don't think we can customize this setting, but @ikreymer probably knows better than I do

Recipe created at https://farm.openzim.org/recipes/www.professeurphifix.net_fr_all ; for now limited to 100 pages to check behavior