rgaudin

Results 844 comments of rgaudin

> We could have a ZIM metadata "source_url" and then allow library.kiwix.org to filter on it? Yes, that's an interesting feature for which the default behavior might be tricky: how...

Thank you, I opened an upstream ticket https://github.com/webrecorder/browsertrix-crawler/issues/36

We've seen this already (and I believe from the very beginning of zimit) but never could really pin it down as it only affects some ZIMs and trying to isolate...

Ah I forgot to paste the log part of me accessing said ZIM in dev.library, using my main FF browser (not working) ``` ====================== Requesting : full_url : /ncert-audiobooks_en_all_2022-07/ method...

Ah ! Interesting clue ; thank you

zim it is not using the latest version of the crawler. Maybe this option changed? We're waiting for a new pylibzim release to update zimit to latest crawler and replayer.

I suppose you missed the `--statsFilename` param… Please reopen if I misinterpreted the problem.

Oh I see, it's not related to the two parallel scrapes I believe. The numbers are updated always as new links are found in newly scraped pages. Could you test...

Ah… OK ; I eventually understand you're problem. Indeed, inside the specified `--output` directory, zimit will create `crawler.json` and `warc2zim.json` temporary files regardless of the `--statsFilename` specified. If running two...