zim-requests New ZIM: Manioc

Website URL: http://www.manioc.org/
License: Non spécifié (consultation et le téléchargement en libre accès de plusieurs dizaines de milliers de documents) donc CC BY-NC-ND ?
Desired ZIM Title: Manioc
Desired ZIM Description: La bibliothèque numérique collaborative Manioc propose la consultation et le téléchargement en libre accès de plusieurs dizaines de milliers de documents anciens et contemporains, textuels, sonores, iconographiques et vidéos concernant les territoires et sociétés de la Caraïbe, de l'Amazonie, du Plateau des Guyanes et des régions et centres d'intérêt connexes.
Desired ZIM Icon –png (URL or attach one):
Language (ISO 639-3): fra
Desired Main Page (homepage): n/a
Is this a MediaWiki?: no
Articles List URL (mediawiki): n/a

May 11 '20 15:05 barbayellow

@barbayellow Might be doable with Zimit

Jul 05 '20 08:07 kelson42

We are impacted by https://github.com/openzim/warc2zim/issues/71, blocking this ticket.

Dec 01 '20 11:12 kelson42

@kelson42 is this request still blocked ?

Apr 19 '22 07:04 JulienMoraliBSF

@JulienMoraliBSF We shpuld again have a look, but pretty pessimist. I don't remember why we failed specificaly to scrape this web site with zimit... but it wad a hard problem.

Apr 19 '22 08:04 kelson42

@JulienMoraliBSF Found it! https://github.com/openzim/warc2zim/issues/71

Apr 19 '22 08:04 kelson42

@kelson42 thx for the update Just to make sure I understand the conclusion : we can't create the Zim right ?

Apr 20 '22 14:04 JulienMoraliBSF

@JulienMoraliBSF It is not a definitibe no, but there is a serious technical burden.

Apr 20 '22 18:04 kelson42

This seems OK to proceed with zimit2 now, except that we need to develop a custom behavior to load all pages of resources.

See e.g. https://www.manioc.org/recherch/HASH256ee3a10e5a5515e58b9e, one needs to code a custom behavior to click all "next page" button to load all pages inside the ZIM.

This is probably the first thing to do: develop a custom behavior and try to ZIM only this single resource. Then, based on that we will have gained knowledge about the feasibility, technically speaking and also in terms of time needed to crawl all pages (I'm a bit concerned by the fact that there is 10k+ resources, and many are books of hundreds of page. Not sure web crawling this is really doable.

Nov 02 '24 20:11 benoit74

Just because this is not documented here, we have create a ZIM, see https://farm.openzim.org/recipes/manioc.org. But it seems that this is still not perfect if I read @benoit74.

Jul 06 '25 11:07 kelson42

ZIM is gone ...

Jul 06 '25 11:07 benoit74

@Popolechien @benoit74 What happened, I dont find a trace here in the repo abou that deletion?!

Jul 06 '25 12:07 kelson42

All storage was gone when Hetzner deleted our machine. And we do not backup non-prod ZIMs.

Jul 06 '25 13:07 benoit74

So we could redo it?

Jul 06 '25 19:07 kelson42

So we could redo it?

See https://github.com/openzim/zim-requests/issues/260#issuecomment-2453118292, probably yes, but not straightforward

Jul 07 '25 08:07 benoit74

zim-requests zim-requests copied to clipboard

New ZIM: Manioc

zim-requests
zim-requests copied to clipboard