rgaudin

Results 844 comments of rgaudin

@TheCrazyT those processes happen early on in the whole scraping process and complete before we start the intensive work. Now that we're expecting outsiders to take a look at this,...

Here's a more complete description for those not familiar with sotoki and this issue. - sotoki creates a [ZIM](https://openzim.org) file given a stack-exchange domain (there are 356 `sotoki --list-all`) -...

Thanks for the updates. Missed those edits above as it doesn't trigger a notification. FYI, a _nopic_ (without any picture) run of stackoverflow is about to end. This code includes...

Good point ; I'd go with `self.nb_seen > self.commit_every` as well as it seems safer.

Please submit a PR once you have a working version ; thanks 🙏

@parvit ; thank you for your contribution but compression and full-text index are enabled by default on purpose, because that's what's wanted 99% of the time. It is still possible...

@TheCrazyT PR #268 did not improve memory usage. Look at the durations as the second one's image is not complete (had to restart the backend) Will try with @parvit suggestion...

OK, here's the graph of the run without neither indexing nor compression. **baseline** **nocomp-noindex** This is very interesting. After the initial hill, the curve is descending only. Also, on this...

We could also introduce a *metadata* at ZIM level indicating to not touch the DOM and in this case it's the content that's responsible to inject that commonly-named JS file...

#322 as well but I'm not looking much at bugs it could fix rather than what it enables: * Easy custom branding * Easy internationalization of the taskbar * Easy...