rgaudin

Results 846 comments of rgaudin

We definitely should but we shall keep an alternative method for S3 download/upload (IO bound) and ffmpeg (cpu bound)

Please improve this ticket's description. The title makes little sense: _extracting_ file from the ZIM is not the responsibility of this scraper…

Since it's extracted from the current URL, it would be good to have its value(s) https://github.com/openzim/warc2zim/blob/main/src/warc2zim/templates/load.js#L24

> Is that a substainable solution? what do you mean? Do you mean you want to actually fetch the YT source code and remove/hide parts of it? That would be...

> @rgaudin would know best for the darkmode part. Nothing to know here

Yes, the run would have failed ; and there are conversion functions to use

@wsdookadr thank you for this. I agree that given how much we are dependent on other projects in warc2zim (and even more with zimit), it would be a very useful...

Size do matter. Scraping over a TB off a third party website is resource intensive for us and for them. zimit is an _uncontrolled environment_ and we don't have tools...

Not sure I fully understand the question but creating, storing and uploading a TB large ZIM file is possible, yes. I think you're referring to manioc.org ZIM.