New ZIM request: Please make zim-files Wikipedia with audio content
Example: https://en.wikipedia.org/wiki/Piano_Sonata_No.14(Beethoven) Or make zim-files Wikipedia-music
Please use the following format for a ZIM creation request (and delete unnecessary information)
- Website URL: https://wikipedia.org
- License: CC-by-sa | Public domain | ... (do NOT request copyrighted material)
- Desired ZIM Title: Wikipedia with images and audio
- Desired ZIM Description: Short description
- Desired ZIM Icon –png (URL or attach one): https://abc.com/icon.png
- Language (ISO 639-3): eng
- Desired Main Page (homepage, if different from website URL): n/a
- Is this a MediaWiki?: **yes **
@benoit74 @rgaudin @Popolechien Here the question is should we consider to publish ZIM files with everything (video/audio included)? We have no flavour for this (not a bug), it would be wikipedia_en_all_2027-07.zim
Do we have a ballpark figure for the size increase this would incur?
We have no flavour for this (not a bug), it would be
wikipedia_en_all_2027-07.zim
Indeed.
Do we have a ballpark figure for the size increase this would incur?
No we don't. Could probably be significant, especially if we consider videos and audios and do not reencode. But hard to say how much significant without a test run on some significant subset. What could be such a subset?
Can you help build this project TSV? It seems that this project is not followed by WP1: https://download.openzim.org/wp1/enwiki_2025-08/projects/
Here is a .tsv from the classical composition category: plenty of audio in there
Just created and requested https://farm.openzim.org/recipes/wikipedia_en_classical-composition to give it a try.
It will be hard to interpolate results since de-facto this is kinda a worst-case scenario where we have lots of audio, but at least we will have a first figure.
See https://farm.openzim.org/pipeline/84254595-a908-4e80-9795-24aff1ebbe39
wikipedia_en_classicalcomposition_maxi_2025-08.zim: 53M wikipedia_en_classicalcomposition_2025-08.zim: 1.29GiB
=> audio are big ; I'm not in favor of publishing them on a systematic maner since usage will probably be mostly negligible (very hard to download / store anyway) and costs for us and wikimedia very significant (storage, bandwidth, ...).
WDYT?
That's a bummer, but not a surprise. Agree this is something we should put on ice for the time being.
Now, classical music may be a warranted exception, but as long as we do not have a clear strategy for which subsets are to be released I would probably park it as well.