libzim icon indicating copy to clipboard operation
libzim copied to clipboard

Fix Counter metadata by normalizing MIME types and removing malformed entries (https://github.com/openzim/zim-tools/issues/473)

Open Aevil1 opened this issue 5 months ago • 6 comments

Fixes https://github.com/openzim/libzim/issues/1000

This patch strips MIME parameters (e.g., charset, profile) to normalize MIME types, removes duplicates, and filters out malformed or incomplete entries in the Counter metadata (e.g., entries without =count or invalid type/subtype format).

Aevil1 avatar Jun 30 '25 19:06 Aevil1

@kelson42 I think that before fixing the problem with Counter metadata we must define how we want it to be fixed.

veloman-yunkan avatar Jul 03 '25 13:07 veloman-yunkan

@kelson42 I think that before fixing the problem with Counter metadata we must define how we want it to be fixed.

Seems straight to me, was is unclear?

kelson42 avatar Jul 08 '25 04:07 kelson42

@kelson42 I have no problem with the part of the solution that deals with stripping of the MIME-type parameters during ZIM creation. But what should we do with unstripped MIME-types recorded in the MIME-type list and Counter metadata in existing ZIM-files? Maybe we should just acknowledge such ZIM-files as buggy and refrain from healing them on-the-fly by newer versions of libzim as attempted in this PR? BTW, this PR addresses the Counter metadata only and in a way that results in dropping (rather than correcting) those MIME-types that have been entered into the Counter metadata with parameters.

veloman-yunkan avatar Jul 10 '25 08:07 veloman-yunkan

@kelson42 I have no problem with the part of the solution that deals with stripping of the MIME-type parameters during ZIM creation. But what should we do with unstripped MIME-types recorded in the MIME-type list and Counter metadata in existing ZIM-files? Maybe we should just acknowledge such ZIM-files as buggy and refrain from healing them on-the-fly by newer versions of libzim as attempted in this PR?

We need to be a bit flexible here. I'm not even sure we should consider MIME_type parameters as wrong "from a ZIM perspective".

BTW, this PR addresses the Counter metadata only and in a way that results in dropping (rather than correcting) those MIME-types that have been entered into the Counter metadata with parameters.

We should not ignore or drop them, we should count them all together (based on the mime-type). @Aevil1 Can you fix that please?

kelson42 avatar Sep 22 '25 15:09 kelson42

@Aevil1 Still motivated to complete the PR?

kelson42 avatar Sep 28 '25 12:09 kelson42

@veloman-yunkan I guess we will have to close this PR and implement the fix in a new PR :(

kelson42 avatar Oct 01 '25 16:10 kelson42