zim-tools icon indicating copy to clipboard operation
zim-tools copied to clipboard

Introduce new tool zimtweak to manipulate a ZIM file

Open kelson42 opened this issue 5 years ago • 4 comments

One of the problem we have with custom apps is that the content tends to be too big. But part of it is unused, I'm talking about the fulltext index (not the title index in Xapian format).

I propose to introduce a tool which would be able to remove a specific article easily/quickly.

This tool could could be then extended to add other features (like add an article).

kelson42 avatar Mar 24 '20 08:03 kelson42

@mgautierfr Feedback welcome on this idea.

kelson42 avatar Mar 24 '20 08:03 kelson42

@mgautierfr Or maybe this should be an option of zimrecreate?

kelson42 avatar Apr 01 '20 10:04 kelson42

This could be interesting, but it s not so easy.

Adding or removing an article means that we have to rewrite all the index part (urlPtrPos, titleIndex, dirent, clusterPtrPos, ...) and all of them are internal to the libzim Removing may be a bit more simple has we could simply remove the cluster without changing the indexes and mark the article as deleted in the dirent (at least a use case for this). However this is tricky as most of the implementation/user code of libzim probably do not handle this.

And of course, we would have to recompute the checksum, and it is probably a new uuid. And we have to change the tags to remove the _ftindex. And...

mgautierfr avatar Apr 06 '20 17:04 mgautierfr

Hello, i wrote some tools in python dedicated to this task: https://gitlab.com/Afrikalan/zim-tools/-/tree/master/zim-manipulation Zim cut index does this job, it it based on a handmade lib named zimDerivate.py that allows changing articles, change zim comrpession and many other things...

moussaCamara avatar Dec 21 '21 06:12 moussaCamara