python-libzim icon indicating copy to clipboard operation
python-libzim copied to clipboard

Add support for new illustrations APIs in libzim 9.4.0

Open benoit74 opened this issue 2 months ago • 2 comments

Fix #232

To be merged after https://github.com/openzim/python-libzim/pull/234 (which is simpler / narrower scope)

Changes:

  • new illustration module with the IllustrationInfo class, used by reader and writer modules
  • new signatures of reader.Archive.get_illustration_item: get "default" and get specific sizes/scales
  • new method reader.Archive.get_illustrations_infos
  • new signature of writer.Creator.add_illustration : pass detailed illustration infos instead of one fixed size

benoit74 avatar Nov 03 '25 21:11 benoit74

Codecov Report

:x: Patch coverage is 98.52941% with 1 line in your changes missing coverage. Please review. :white_check_mark: Project coverage is 94.23%. Comparing base (0f0820a) to head (7db4684). :warning: Report is 1 commits behind head on libzim_9.4.0.

Files with missing lines Patch % Lines
libzim/libzim.pyx 98.52% 1 Missing :warning:
Additional details and impacted files
@@               Coverage Diff                @@
##           libzim_9.4.0     #233      +/-   ##
================================================
+ Coverage         93.77%   94.23%   +0.46%     
================================================
  Files                 1        1              
  Lines               546      607      +61     
================================================
+ Hits                512      572      +60     
- Misses               34       35       +1     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Nov 06 '25 14:11 codecov[bot]

Ready for review, wheels build still failing on ubuntu arm64 for "known reasons" but rest is in place so ready to be analyzed

benoit74 avatar Nov 06 '25 14:11 benoit74

Thank you!

Regarding addIllustrations, there is "only" four variants available in the libzim: https://github.com/openzim/libzim/blob/8ef511ba0de6a8180d79fbc3c7edcfec4202ae5e/include/zim/writer/creator.h#L165-L197 ; I did not exposed the "contentProvider" variants because the one previously present was not exposed either. I could add the two missing, but it will change nothing to your point.

I generally second your remarks and frustrations about how "complex" stuff still is with current PR, but I considered we've agreed python-libzim should be a very thin wrapper on top of the libzim, and everything more "complex" should go into the python-scraperlib. I feel like your python-glue is something which should be implemented in the python-scraperlib.

For instance even with only this PR, it is already possible to do things like below which are not "that" awful. IllustrationInfo is just a structure to hold all details about the illustration:

creator.add_illustration(IllustrationInfo(48, 48), png_data)
creator.add_illustration(IllustrationInfo(48, 48, 2), png_data)

Details point are all valid.

benoit74 avatar Nov 07 '25 16:11 benoit74

I feel like your python-glue is something which should be implemented in the python-scraperlib.

I don't think so but this is debatable of course.

scraperlib is not intended for many users. It is, as its name implies, scraper-oriented and quite large and have many dependencies. We can do as we please there because we are the main users and requiring explicit extra types for this is OK there. We're already enforcing a lot of stuff, because we're the ones using it.

pylibzim on the other end is to be used by hopefully many users. It has zero runtime dependency, is lightweight and lean. I am a strong advocate of the pure-binding strategy but in this case, the libzim API is technically-oriented and not user oriented. I'd prefer the same I asked for months ago which is exposing what we are using: width, height, scale. If it were not for those extraAttributes that were not part of the original request, we would not have this intermediate type.

It just seems silly to push this uncomfortable change to our users but let's keep this PR like this and open a separate ticket to discuss whether we want to make an exception or not.

rgaudin avatar Nov 07 '25 16:11 rgaudin

Simple changes pushed, but I'm not comfortable with merging this if you are not comfortable with this PR "design" ; changing this afterwards might means breaking the API so needs a major, uncomfortable changes for "users", etc ...

I see no urgency in pushing these changes to the python-libzim, since we have no use-case so far, and would prefer to have a decision comfortable for everyone. And I somehow share your concerns, especially on the reader side. I'm less embarrassed by the fact that we need the new IllustrationInfo() type than you are.

Since python-libzim is anyway not used by our readers (so far, unfortunately?), maybe a simple middle ground would be:

  • add additional glue for add_illustration(width, height, scale, data), add_illustration(size, scale, data) (squared), get_illustration_item(width, height, scale) and get_illustration_item(size, scale) (squared), since they really don't provide anything besides simple glue we should be able to support "forever"
  • let the more complex issue about how readers find their illustration untouched, it is still unclear if we need a common solution to that or should let each readers find their way through this based on their needs (e.g. when a reader would prefer a 96x96@1 and it does not exists but 48x48@2 and 126x126@1 do exist, do we return both, only biggest res, ...)

WDYT?

benoit74 avatar Nov 10 '25 08:11 benoit74

I'm fine with this simple glue strategy. It's dumb enough to not be a problem, as you wrote.

Regarding the more complex reader usage, there's a ticket for that on libzim, it will definitely be implemented there but first step was to allow addition of multiple scales and width.

rgaudin avatar Nov 10 '25 09:11 rgaudin

Hum, I'm afraid that in fact this simple glue strategy is not that simple, because it will break the API or introduce dirty glue code for add_illustration.

Currently second argument to add_illustration is the content. If content can be either second or third or fourth depending on other arguments, this is either brittle (we count arguments to know where content is) or breaking (we force content to be a named argument).

Do you have suggestion about how to handle this? Should we introduce different method names? I'm puzzled

Same kind of issue with the scale in get_illustration_item(width, height, scale) and get_illustration_item(size, scale), should we have scale first or should we named arguments or something else?

benoit74 avatar Nov 10 '25 09:11 benoit74

Well libzim 9.4.0 did break the API (version doesn't tell because they don't respect semver) so I see no reason we wouldn't allow ourselves to.

Also, it's a binding not a pure copy which it can't be, especially since python doesn't support overloading natively. To me it would only make sense to have the content the first param and the rest afterwards. It could even be a single func with width_or_squaresize, height, scale.

What do you think?

rgaudin avatar Nov 11 '25 09:11 rgaudin

I don't know ; I will make a first release with 9.4.0 and Python 3.13 but without this PR, and then we will have time to take informed decision

benoit74 avatar Nov 13 '25 12:11 benoit74