libzim icon indicating copy to clipboard operation
libzim copied to clipboard

We have achieved to create an incoherent ZIM file

Open kelson42 opened this issue 6 years ago • 8 comments

It looks like in http://tmp.kiwix.org/tutu.zim we have xapian index entries which point to article which are not in the ZIM file. It should not be possible at all. Created with libzim 5.0.0.

See https://github.com/kiwix/kiwix-tools/issues/299

kelson42 avatar Jun 12 '19 05:06 kelson42

Investigate a bit, we could have this behavior if :

  • Redirect articles are indexed (shouldIndex() return true)
  • The redirect is invalid (ie, it redirects to a non existent article)

In this case, we index the article (the title only because there is no content) and after we remove it from the zim.

I heard it was some issue on mwoffliner side about redirect articles. Is it something described here or there is another bug we have to found ? (We still have to fix libzim to handle this case but at least we have found the bug)

CC @ISNIT0 @kelson42

mgautierfr avatar Jun 12 '19 12:06 mgautierfr

Related to kiwix/kiwix-lib#222

mgautierfr avatar Jun 12 '19 12:06 mgautierfr

@mgautierfr No chance to verify first if the redirect is valid? That way we might avoid the incoherence in any case. Otherwise OK for me to close that ticket. This is fundamentaly a problem with mwoffliner.

kelson42 avatar Jun 12 '19 13:06 kelson42

No chance to verify first if the redirect is valid

Yes, it is fixable, we simply must not index redirect article when mwoffliner give us but after we resolve the redirect (and remove wrong redirect). The question is "It is what happen when creating tutu.zim ?" (and so we can fix it, or we have to investigate further because there is another bug)

mgautierfr avatar Jun 12 '19 13:06 mgautierfr

@mgautierfr I strongly suspect this is what happen because:

  • none redirect is in the ZIM file
  • We had this kind of errors during the ZIM creation.
  • zimcheck was reporting a lot of dead links which were all redirect links

kelson42 avatar Jun 12 '19 13:06 kelson42

@mgautierfr Do we have meanwhile a kind of security to avoid this? Or do we still will index the redirects straight and create a problem?

kelson42 avatar Jul 10 '20 09:07 kelson42

Nothing has change here. The problem is still there.

mgautierfr avatar Jul 10 '20 13:07 mgautierfr

We had a new case which has caused us a bit or work https://github.com/kiwix/apple/pull/637

kelson42 avatar Mar 05 '24 13:03 kelson42