libzim
libzim copied to clipboard
We have achieved to create an incoherent ZIM file
It looks like in http://tmp.kiwix.org/tutu.zim we have xapian index entries which point to article which are not in the ZIM file. It should not be possible at all. Created with libzim 5.0.0.
See https://github.com/kiwix/kiwix-tools/issues/299
Investigate a bit, we could have this behavior if :
- Redirect articles are indexed (
shouldIndex()return true) - The redirect is invalid (ie, it redirects to a non existent article)
In this case, we index the article (the title only because there is no content) and after we remove it from the zim.
I heard it was some issue on mwoffliner side about redirect articles. Is it something described here or there is another bug we have to found ? (We still have to fix libzim to handle this case but at least we have found the bug)
CC @ISNIT0 @kelson42
Related to kiwix/kiwix-lib#222
@mgautierfr No chance to verify first if the redirect is valid? That way we might avoid the incoherence in any case. Otherwise OK for me to close that ticket. This is fundamentaly a problem with mwoffliner.
No chance to verify first if the redirect is valid
Yes, it is fixable, we simply must not index redirect article when mwoffliner give us but after we resolve the redirect (and remove wrong redirect). The question is "It is what happen when creating tutu.zim ?" (and so we can fix it, or we have to investigate further because there is another bug)
@mgautierfr I strongly suspect this is what happen because:
- none redirect is in the ZIM file
- We had this kind of errors during the ZIM creation.
- zimcheck was reporting a lot of dead links which were all redirect links
@mgautierfr Do we have meanwhile a kind of security to avoid this? Or do we still will index the redirects straight and create a problem?
Nothing has change here. The problem is still there.
We had a new case which has caused us a bit or work https://github.com/kiwix/apple/pull/637