rgaudin comments

Results 846 comments of


                                            rgaudin

Simplify Gutenberg scraping (no more rsync, no more fallback URLs / filenames)

Great news ! We'll test and integrate it

Use python-libzim

Great! Maybe leave progress reporting for a second time? Switching to libzim would be a great achievement already

Sync with translatewiki (Getting more locales)

In your ticket, I see `Prefix: mwoffliner-`. Is that a typo? > Can you split the stuff to have one file per language like we have in other Python projects...

Sync with translatewiki (Getting more locales)

Looks like TW supports gettext so that's probably what we're gonna use. We also have strings in JS code. We'd need to assess tools. https://guillaumepotier.github.io/gettext.js/ would help a single format...

Die if the upstream server is not reachable

I think `retrying` is probably the way to go here. That's what we do on other scrapers. We uses `backoff` but `retrying` seems like a better choice.

Die if the upstream server is not reachable

@benoit74 I believe the problem is not the calls but in the fact that they are treated independently, blindly. We are using a single target host and we have more...

Die if the upstream server is not reachable

@kevinmcmurtrie, your input is important and duly noted ; it's not the first time you're sharing this with us. While it's an highly important point to our process, changing stack...

Consider a better support of ZIM files without books in HTML

The most difficult part here is the one that's not been mentioned: the UI. With our generic UI that What does entries look like? An html shell that displays epub.js...

Project Gutenberg ZIMs are barely accessible for clients that do not support JavaScript in the ZIM

I share this conclusion. I'd prefer more scrapers to work without JS but it's hardly realistic. Some of them are just dependent on JS and others, like gutenberg are built...

Stack Overflow name is not visible (orange on orange)

@Rayan-Rasheed I believe it is but we only only refresh StackOverflow once or twice a year. What should be done here is to tweak the scraper to not download/process data...