Fix Counter metadata by normalizing MIME types and removing malformed entries (https://github.com/openzim/zim-tools/issues/473)
Fixes https://github.com/openzim/libzim/issues/1000
This patch strips MIME parameters (e.g., charset, profile) to normalize MIME types, removes duplicates, and filters out malformed or incomplete entries in the Counter metadata (e.g., entries without =count or invalid type/subtype format).
@kelson42 I think that before fixing the problem with Counter metadata we must define how we want it to be fixed.
@kelson42 I think that before fixing the problem with Counter metadata we must define how we want it to be fixed.
Seems straight to me, was is unclear?
@kelson42 I have no problem with the part of the solution that deals with stripping of the MIME-type parameters during ZIM creation. But what should we do with unstripped MIME-types recorded in the MIME-type list and Counter metadata in existing ZIM-files? Maybe we should just acknowledge such ZIM-files as buggy and refrain from healing them on-the-fly by newer versions of libzim as attempted in this PR? BTW, this PR addresses the Counter metadata only and in a way that results in dropping (rather than correcting) those MIME-types that have been entered into the Counter metadata with parameters.
@kelson42 I have no problem with the part of the solution that deals with stripping of the MIME-type parameters during ZIM creation. But what should we do with unstripped MIME-types recorded in the MIME-type list and Counter metadata in existing ZIM-files? Maybe we should just acknowledge such ZIM-files as buggy and refrain from healing them on-the-fly by newer versions of libzim as attempted in this PR?
We need to be a bit flexible here. I'm not even sure we should consider MIME_type parameters as wrong "from a ZIM perspective".
BTW, this PR addresses the Counter metadata only and in a way that results in dropping (rather than correcting) those MIME-types that have been entered into the Counter metadata with parameters.
We should not ignore or drop them, we should count them all together (based on the mime-type). @Aevil1 Can you fix that please?
@Aevil1 Still motivated to complete the PR?
@veloman-yunkan I guess we will have to close this PR and implement the fix in a new PR :(