zim-requests
zim-requests copied to clipboard
New ZIM: Manioc
- Website URL: http://www.manioc.org/
- License: Non spécifié (consultation et le téléchargement en libre accès de plusieurs dizaines de milliers de documents) donc CC BY-NC-ND ?
- Desired ZIM Title: Manioc
- Desired ZIM Description: La bibliothèque numérique collaborative Manioc propose la consultation et le téléchargement en libre accès de plusieurs dizaines de milliers de documents anciens et contemporains, textuels, sonores, iconographiques et vidéos concernant les territoires et sociétés de la Caraïbe, de l'Amazonie, du Plateau des Guyanes et des régions et centres d'intérêt connexes.
- Desired ZIM Icon –png (URL or attach one):

- Language (ISO 639-3): fra
- Desired Main Page (homepage): n/a
- Is this a MediaWiki?: no
- Articles List URL (mediawiki): n/a
@barbayellow Might be doable with Zimit
We are impacted by https://github.com/openzim/warc2zim/issues/71, blocking this ticket.
@kelson42 is this request still blocked ?
@JulienMoraliBSF We shpuld again have a look, but pretty pessimist. I don't remember why we failed specificaly to scrape this web site with zimit... but it wad a hard problem.
@JulienMoraliBSF Found it! https://github.com/openzim/warc2zim/issues/71
@kelson42 thx for the update Just to make sure I understand the conclusion : we can't create the Zim right ?
@JulienMoraliBSF It is not a definitibe no, but there is a serious technical burden.
This seems OK to proceed with zimit2 now, except that we need to develop a custom behavior to load all pages of resources.
See e.g. https://www.manioc.org/recherch/HASH256ee3a10e5a5515e58b9e, one needs to code a custom behavior to click all "next page" button to load all pages inside the ZIM.
This is probably the first thing to do: develop a custom behavior and try to ZIM only this single resource. Then, based on that we will have gained knowledge about the feasibility, technically speaking and also in terms of time needed to crawl all pages (I'm a bit concerned by the fact that there is 10k+ resources, and many are books of hundreds of page. Not sure web crawling this is really doable.
Just because this is not documented here, we have create a ZIM, see https://farm.openzim.org/recipes/manioc.org. But it seems that this is still not perfect if I read @benoit74.
ZIM is gone ...
@Popolechien @benoit74 What happened, I dont find a trace here in the repo abou that deletion?!
All storage was gone when Hetzner deleted our machine. And we do not backup non-prod ZIMs.
So we could redo it?
So we could redo it?
See https://github.com/openzim/zim-requests/issues/260#issuecomment-2453118292, probably yes, but not straightforward