Special handling for known websites (WP, youtube, ted, etc)
I see that almost every day (and certainly several times a week) people are running requests for Wikipedia, Wikibooks or even Youtube. Zimit should be able to a) switch gears to run the corresponding scrapers (youtube), or directly offer the latest zim available (wikipedia, wikibooks).
No, we've discussed that a while back and apparently, we did not create ticket but the idea was to have a list of known websites for which we refuses request and display a message explaining where to find already existing ZIMs.
Switching scraper is not practical for many reasons ; mainly because we have no limit on those other scrapers
display a message explaining where to find already existing ZIMs.
Sounds good to me and was the main point, but then the response message should identify the target and corresponding zim (e.g. "here is the link to en.wikipedia.org's latest in available" and not "got to download.kiwix.org/zim and figure it out".
Ideally, yes. It can probably be implemented in two steps so that this gets a chance to be done.
At first, we can redirect to the Wiki where files are listed. Or maybe the library with new kiwix-serve is considered easy-enough ?
First thing you can do is list the domains and where to point to. It's easy for those we have a category for. Youtube will require special treatment anyway as we don't have ready made ZIMs for all. I see two options:
- we keep it as it is, but add a message on request saying this is probably not what they want and both link to the scaper and the contact form to request a custom ZIM.
- or we block the request and show a similar message
Or maybe the library with new kiwix-serve is considered easy-enough ?
This would have my preference by far, but when I look at domains, based on the past three months (and this doc) I think we can simply send them to wikipedia_en_all.zim
We could have a ZIM metadata "source_url" and then allow library.kiwix.org to filter on it?
We could have a ZIM metadata "source_url" and then allow library.kiwix.org to filter on it?
Yes, that's an interesting feature for which the default behavior might be tricky: how much matching do you want? domain? netloc ? path ? scheme ? but yeah, that would be best for us.
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.