Introduce new tool zimtweak to manipulate a ZIM file
One of the problem we have with custom apps is that the content tends to be too big. But part of it is unused, I'm talking about the fulltext index (not the title index in Xapian format).
I propose to introduce a tool which would be able to remove a specific article easily/quickly.
This tool could could be then extended to add other features (like add an article).
@mgautierfr Feedback welcome on this idea.
@mgautierfr Or maybe this should be an option of zimrecreate?
This could be interesting, but it s not so easy.
Adding or removing an article means that we have to rewrite all the index part (urlPtrPos, titleIndex, dirent, clusterPtrPos, ...) and all of them are internal to the libzim Removing may be a bit more simple has we could simply remove the cluster without changing the indexes and mark the article as deleted in the dirent (at least a use case for this). However this is tricky as most of the implementation/user code of libzim probably do not handle this.
And of course, we would have to recompute the checksum, and it is probably a new uuid. And we have to change the tags to remove the _ftindex. And...
Hello, i wrote some tools in python dedicated to this task: https://gitlab.com/Afrikalan/zim-tools/-/tree/master/zim-manipulation Zim cut index does this job, it it based on a handmade lib named zimDerivate.py that allows changing articles, change zim comrpession and many other things...