warc2zim icon indicating copy to clipboard operation
warc2zim copied to clipboard

Do not rewrite links twice, once statically and once dynamically

Open benoit74 opened this issue 1 year ago • 5 comments

See e.g. https://library.kiwix.org/content/noted.lol_en_all_2024-10/noted.lol/convert-any-website-into-a-zim-file-zimit/

At the bottom of the article, we have a link to Zimit GitHub repository

This link does not open properly on kiwix-serve. Same behavior observed on Apple and PWA, so this really looks like an issue with wombat intercepting the click event and doing nasty things.

benoit74 avatar Oct 29 '24 08:10 benoit74

urlRewriten:
	- current_url: https://library.kiwix.org/content/noted.lol_en_all_2024-10/noted.lol/convert-any-website-into-a-zim-file-zimit/
	- orig_host: noted.lol
	- orig_scheme: https
	- orig_url: https://noted.lol/convert-any-website-into-a-zim-file-zimit/
	- prefix: https://library.kiwix.org/content/noted.lol_en_all_2024-10/
	- url: https://github.com/openzim/zimit?ref=noted.lol
	- useRel: false
	- mod: undefined
	- doc: undefined
	- finalUrl: https://library.kiwix.org/content/noted.lol_en_all_2024-10/github.com/openzim/zimit%3Fref%3Dnoted.lol
	[wombatSetup.js:2:18356](https://library.kiwix.org/content/noted.lol_en_all_2024-10/_zim_static/wombatSetup.js)

rgaudin avatar Oct 29 '24 10:10 rgaudin

lol ;)

kelson42 avatar Oct 29 '24 10:10 kelson42

Indeed !

Original website:

<script type="text/javascript">
    var links = document.querySelectorAll('a');
    links.forEach((link) => {
        var a = new RegExp('/' + window.location.host + '/');
        if(!a.test(link.href)) {
          	link.addEventListener('click', (event) => {
                event.preventDefault();
                event.stopPropagation();
                window.open(link.href, '_blank');
            });
        }
    });
</script>

So all links are handled through javascript ... and hence dynamically rewritten even if we decided not to during static rewriting ...

Not sure how we can handle this, after all, we've said that all calls made from javascript must be rewritten for proper operation ...

Should we add a tweak in the static rewriting that can then be seen and used in dynamic rewriting so that we know we've already rewritten the link for sure and we've made the decision to not rewrite it?

Should we add yet another configuration switch to warc2zim to be able to configure when we do not want to inject wombat into a script like this (but it is hard to specify which script we want to ignore since it has no ID?)

benoit74 avatar Oct 29 '24 13:10 benoit74

This also happens for www.cdc.gov, see https://github.com/openzim/zimit/issues/449

benoit74 avatar Jan 08 '25 12:01 benoit74

This also happens for activisthandbook.org (https://github.com/openzim/zim-requests/issues/1355), looks like links of assets are rewritten twice, once statically and once dynamically.

Web console in browser:

TypeError: error loading dynamically imported module: https://dev.library.kiwix.org/content/activisthandbook.org_en_all_2025-04/dev.library.kiwix.org/assets/index.md.3b9f4a11.lean.js

benoit74 avatar Apr 10 '25 08:04 benoit74