ENH: Optimize getting named destinations
This optimization solves the performance issue introduced by PR #3400.
Without this optimization test_merger.py takes 124 seconds on my laptop after PR 3400. With this optimization it takes 6.6 seconds.
Very likely this optimization will benefit other code as well.
@stefan6419846 I need some help with this. ruff and Python both are complaining about Dict in my added code, claiming it's not declared. It is imported on line 39, though. And other code uses it in exactly the same way, without ruff or Python complaining. It works for me in both Python and ruff locally. What is going on?
Please try to rebase on the latest main - your base appears to be outdated and the corresponding import is gone in main.
Codecov Report
:white_check_mark: All modified and coverable lines are covered by tests.
:white_check_mark: Project coverage is 96.98%. Comparing base (bc318d7) to head (8465c0d).
:warning: Report is 5 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #3442 +/- ##
=======================================
Coverage 96.97% 96.98%
=======================================
Files 54 54
Lines 9337 9344 +7
Branches 1711 1713 +2
=======================================
+ Hits 9055 9062 +7
Misses 168 168
Partials 114 114
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
That worked. Thank you! The PR should be ready now.
Thanks for the PR. This code seems to assume that the named destinations never change. Is this really a valid assumption or will this create unexpected behavior in specific cases?
It does assume the destinations never change. Originally I put this cache on the shared superclass for readers and writers, but then had the same realization as you. So now this code is only in the PdfReader class. That works for the purposes of the link rewriting code etc, but it does mean that getting named destinations from ´PdfWriter` is as slow as it used to be.
Thanks for the explanation - indeed, in the reader it should not be an issue. Are we able to add a test for this?
Hmmm. What sort of tests are you looking for? Tests of changing the named destinations in the reader? I guess not, since that's not supposed to be possible. Tests of changing the named destinations in the writer? It doesn't use this code, but I guess I could add. Or just tests of reading named destinations at all? Those might already exist for all I know, but I could check, once I know what you're thinking.
Sorry for not being explicit enough - I am referring to the fact that these are cached. One possible approach would be to check the time for handling many destinations the first and the second time, although I am open for other tests as well.