benoit74

Results 370 issues of benoit74

PWA: pwa.kiwix.org, 3.3.2 ZIM: https://mirror.download.kiwix.org/zim/.hidden/dev/mes-quartiers-chinois_fr_all_2024-05.zim Safari: 17.5 on MacOS Sonoma 14.5 (both also observed by @Jaifroid on iPhone 15 Pro Max with iOS 17 Safari The Youtube video (see eg....

regression
bug-non-critical
zimit

For some reason, https://library.kiwix.org/viewer#edu.gcfglobal.org_en_all_2024-06 is not properly redirecting to https://library.kiwix.org/viewer#edu.gcfglobal.org_en_all_2024-06/edu.gcfglobal.org/en/topics/ The viewer loads but then for some reason the iframe is not redirected to the proper resource. However https://library.kiwix.org/viewer#edu.gcfglobal.org_en_all_2024-06/ is...

bug
kiwix-serve

ZIM: https://dev.library.kiwix.org/viewer#fas-military-medicine_en_2024-05/irp.fas.org/doddir/milmed/index.html Chrome: 125 OS: MacOS Sonoma 14.5 When clicking on Steve Aftergood at the bottom of the front page, we should open a mailto: link. This does not happen....

bug
kiwix-serve

ZIM: mes-quartiers-chinois_fr_all_2024-05 on dev.library.kiwix.org Scraper: warc2zim 2.0.0-dev8 + zimit 2.0.0-dev5 + Browsertrix crawler 1.1.3 Browser: Firefox 126.0 on Mac OS Sonoma 14.5 When clicking on a link with `target="_blank"`, this...

bug
kiwix-serve

It would be nice if the crawler could automatically fetch rules from `robots.txt` and add `exclusion` rules for every rule present in the `robots.txt` file. I think this functionality should...

I'm trying to create a login profile for www.solidarite-numerique.fr, in order to set cookies which will disable the display of banners highlighted in green in screenshot below. ![image](https://github.com/webrecorder/browsertrix-crawler/assets/7102089/325f5eda-aa3c-4d08-8a0d-c2e538c9b81f) Banner 1...

Debian distro now requires the use of virtual environments to not mess with dependencies installed by official apt packages This commit also removes tldextract update now that pywb is not...

We have three things which can stop the crawler in the middle of a run: - `--sizeLimit`: the maximum warc size - `--timeLimit`: the maximum duration of the crawl -...

Lots of web frameworks store custom data in `data-xxx` tags which are quite standard: https://www.w3schools.com/tags/att_global_data.asp While these tags are custom per application, they regularly contains URLs to assets that will...

enhancement

Scraping large website (millions of pages) is challenging because: - since the scrape takes long to complete, the chance the website changes during the crawl is significant: - this can...

enhancement
question