Wikipedia_en_top_all has 829k entries instead of 50k
ZIM(s) location
https://browse.library.kiwix.org/#lang=eng&q=best+of+wikipedia
Recipe(s) URL
https://farm.openzim.org/recipes/wikipedia_en_top
Readers tested
- [ ] Kiwix-serve on iOS (iPad / iPhone)
- [ ] Kiwix-serve on Android (phone or tablet)
- [ ] Kiwix-serve on Windows
- [ ] Kiwix-serve on Linux
- [ ] Kiwix-serve on Raspberry Pi (e.g. hotspot)
- [ ] Kiwix-serve on Mac
- [ ] pwa.kiwix.org
- [ ] Kiwix JS - Chrome extension
- [ ] Kiwix JS - Firefox extension
- [ ] Kiwix JS - Edge extension
- [x] Kiwix for Android application
- [x] Kiwix for MacOS application
- [ ] Kiwix for iOS (iPad/iPhone) application
Which ZIM versions are impacted?
All PROD versions are impacted
Details
Two users reported on reddit that the zim file as a lot more entries than expected, both on Apple and Android devices
@benoit74 This lools like the redirects would be wrongly counted in the articleCount!
Copying @Jaifroid's comment here as it may be an interesting insight:
the format of the recent Wikimedia ZIMs produced by mwOffliner has switched from minorVersion 0 (with a separate A/ article namespace) to minorVersion 3 (with only a C/ content namespace which includes all user content, including images, and no old titleIndex). If the software is looking for a article count based on the length of the old title index, it's going to calculate the wrong value. I'm not sure this is what is happening, but given that it used to show the correct article count and no longer does since May/June this year
These numbers should be based on Counter Metadata, see https://wiki.openzim.org/wiki/Metadata. The libkiwix provides the primitives to have both article and media counts.
Is this really that important? There is really only 50k articles, I'm sure about that. File size is correct. Who cares about the rest ...
And counter is even correct: https://browse.library.kiwix.org/raw/wikipedia_en_top_maxi_2025-06/meta/Counter
JFYI that was a comment I wrote on Reddit replying to someone who had queried the sudden change in the article count for top ZIMs. See https://www.reddit.com/r/Kiwix/comments/1lmmei1/why_do_the_new_best_of_wikipedia_zims_say_they/ . I personally don't care about it, but clearly some Redditors do, so I thought I'd give my best guess as to what might be going on.
I wouldn't see this as top priority, but if it's an easy fix, it should be fixed in due ourse. It is misleading to show 859,640 articles when there are in fact only 50,000.
And counter is even correct:
I can not say formsure if it is the reason of this bug, but this counter does not respect the spec in many parts of the string.
We should fix this and there is good chances that this will fix the bug.
Moving to MWoffliner.
@kelson42 can you explain what is wrong in the counter value so that we have a chance to fix it?
application/javascript=4;application/pdf=3;image/apng=1;image/gif=5166;image/jpeg=280;image/png=124;image/svg+xml=8;image/svg+xml; charset=utf-8; profile="https://www.mediawiki.org/wiki/Specs/SVG/1.0.0"=67381;image/webp=524540;text/css=28;text/html=50000;text/html; charset=iso-8859-1=1;text/javascript=3
Note that we do not position this Counter metadata in mwoffliner scraper at all, so should something be wrong in this, this is at least half libzim fault 🤣
Actually this is not checked properly in zimcheck either, therefore making a feature request.
@benoit74 The Counter metadata is indeed written by the libzim, based on the mime-types given by MWoffliner. See https://github.com/openzim/libzim/blob/main/src/writer/counterHandler.h for the exact piece of code.
In the string application/javascript=4;application/pdf=3;image/apng=1;image/gif=5166;image/jpeg=280;image/png=124;image/svg+xml=8;image/svg+xml; charset=utf-8; profile="https://www.mediawiki.org/wiki/Specs/SVG/1.0.0"=67381;image/webp=524540;text/css=28;text/html=50000;text/html; charset=iso-8859-1=1;text/javascript=3 which is the Counter metadata for [/wikipedia_en_top_maxi_2025-06.zim I see following problems:
image/svg+xml;which has not=xyzpart ... and we already have an entryimage/svg+xml=8. This seems to be a bug in the libzim... but probably triggered by an incongruity given by MWofflinerprofile="https://www.mediawiki.org/wiki/Specs/SVG/1.0.0"=67381;whereprofile="https://www.mediawiki.org/wiki/Specs/SVG/1.0.0"does not look like a mime-type. Here I believe there is a bad handling of the mime-type parameterprofile="https://www.mediawiki.org/wiki/Specs/SVG/1.0.0". Here again it looks more like a bug in the libzim- Again a value without number at
text/html;
To conclude, kind of agree that this is at least 90% a bug in the libzim... an therefore probably not a regression (which is surprising to me considering the visibility of the bug).
Who wants to code pathes in C++? ;)
In the string
application/javascript=4;application/pdf=3;image/apng=1;image/gif=5166;image/jpeg=280;image/png=124;image/svg+xml=8;image/svg+xml; charset=utf-8; profile="https://www.mediawiki.org/wiki/Specs/SVG/1.0.0"=67381;image/webp=524540;text/css=28;text/html=50000;text/html; charset=iso-8859-1=1;text/javascript=3which is theCountermetadata for[/wikipedia_en_top_maxi_2025-06.zimI see following problems:
image/svg+xml;which has not=xyzpart ... and we already have an entryimage/svg+xml=8. This seems to be a bug in the libzim... but probably triggered by an incongruity given by MWoffliner
profile="https://www.mediawiki.org/wiki/Specs/SVG/1.0.0"=67381;whereprofile="https://www.mediawiki.org/wiki/Specs/SVG/1.0.0"does not look like a mime-type. Here I believe there is a bad handling of the mime-type parameterprofile="https://www.mediawiki.org/wiki/Specs/SVG/1.0.0". Here again it looks more like a bug in the libzim
It rather seems to me that this is a result of having a MIME-type string image/svg+xml; charset=utf-8; profile="https://www.mediawiki.org/wiki/Specs/SVG/1.0.0" for most of (67381) SVGs. Similarly, there is one HTML page that comes with a MIME-type string of text/html; charset=iso-8859-1.
In the string
application/javascript=4;application/pdf=3;image/apng=1;image/gif=5166;image/jpeg=280;image/png=124;image/svg+xml=8;image/svg+xml; charset=utf-8; profile="https://www.mediawiki.org/wiki/Specs/SVG/1.0.0"=67381;image/webp=524540;text/css=28;text/html=50000;text/html; charset=iso-8859-1=1;text/javascript=3which is theCountermetadata for[/wikipedia_en_top_maxi_2025-06.zimI see following problems:
image/svg+xml;which has not=xyzpart ... and we already have an entryimage/svg+xml=8. This seems to be a bug in the libzim... but probably triggered by an incongruity given by MWoffliner
profile="https://www.mediawiki.org/wiki/Specs/SVG/1.0.0"=67381;whereprofile="https://www.mediawiki.org/wiki/Specs/SVG/1.0.0"does not look like a mime-type. Here I believe there is a bad handling of the mime-type parameterprofile="https://www.mediawiki.org/wiki/Specs/SVG/1.0.0". Here again it looks more like a bug in the libzimIt rather seems to me that this is a result of having a MIME-type string
image/svg+xml; charset=utf-8; profile="https://www.mediawiki.org/wiki/Specs/SVG/1.0.0"for most of (67381) SVGs. Similarly, there is one HTML page that comes with a MIME-type string oftext/html; charset=iso-8859-1.
Yes, we have to remove all mime-types parameters.