libzim icon indicating copy to clipboard operation
libzim copied to clipboard

Add support for redirect with fragments

Open benoit74 opened this issue 5 months ago • 5 comments

Currently, libzim API allows to add a redirect from one path to another path.

In multiple scrapers (mwoffliner, freecodecamp, mindtouch, youtube ...), we need to add redirects so that we have proper entries in the title search index, but these redirects target must be a path and a fragment.

For instance in mwoffliner, when user search for "Geography of Nairobi" in WPEN, we need to redirect to "Nairobi#Geography" so that user is redirected to the "Geography" section of "Nairobi" article.

This is not supported by libzim which only supports redirect to a path.

For now, the workaround consists in adding a real ZIM item instead of the redirect, with minimal HTML content using a meta tag to immediately redirect:

<html>
  <head>
    <title>Geography of Nairobi</title>
    <meta http-equiv="refresh" content="0;URL='./Nairobi#Geography'" />
  </head>
  <body></body>
</html>

libzim should probably add support for these redirects with fragments natively.

The downside would be the impact on readers. If one needs to adapt readers to benefit from this new functionality (which I suspect might be the case), this would be a huge downside. For scrapers like freecodecamp, mindtouch and youtube, we use a Vue.JS UI so all redirects go to index.html and the fragment indicate which resource must be loaded. This is hence mandatory for proper operation of title search on these ZIMs. Should an old reader not be capable to "find" the fragment of the newer ZIM, then title search would not work anymore.

But we probably still must move this forward, and maybe wait few months / years to have sufficient updated readers in the wild before starting using this in ZIMs.

benoit74 avatar Jul 08 '25 07:07 benoit74

Can you provide a concrete example where this would be needed?

Anchors are client side technology and libzim operates as a server side replacement and thus leaves anchor handling to the client.

If I link to Geography#Nairobi and Geography is a redirect to Kenya, the I'll be taken to Kenya#Nairobi

rgaudin avatar Jul 09 '25 15:07 rgaudin

Can you provide a concrete example where this would be needed?

Anchors are client side technology and libzim operates as a server side replacement and thus leaves anchor handling to the client.

If I link to Geography#Nairobi and Geography is a redirect to Kenya, the I'll be taken to Kenya#Nairobi

Linking to https://en.wikipedia.org/wiki/Geography_of_Nairobi will redirect you to Nairobi#Geography

The anchor is added by the redirect, if the link contains no previous anchor. (If you link to Geography_of_Nairobi#History the redirect goes to Nairobi#History as you described.)

Markus-Rost avatar Jul 09 '25 17:07 Markus-Rost

@benoit74 What type of solution can you suggest in libzim other than implementing the described workaround behind zim::writer::Creator::addRedirection() (or a new method)? I concur with @rgaudin on that

Anchors are client side technology and libzim operates as a server side replacement and thus leaves anchor handling to the client.

veloman-yunkan avatar Jul 10 '25 08:07 veloman-yunkan

I agree that anchors are client side technology and libzim should just pass proper information for this to the reader.

I agree that if scraper creates nasty redirects ... it will not work.

Example use case for WPEN is indeed that we want the path Geography_of_Nairobi to redirect to Nairobi#Geography, i.e. path to load is Nairobi and reader should "move" user to Geography anchor.

I'm not sure that implementing workaround mentioned above in libzim (to avoid storing HTML "for nothing") is the proper solution because @Markus-Rost told me this has been deprecated by W3C (but I miss proper reference here).

I don't have exact solution in mind about how to handle this correctly and as transparently as possible. I'm not even sure how readers currently retrieve redirects, I lack proper libzim experience on that.

benoit74 avatar Jul 10 '25 09:07 benoit74

The current model for (proper) redirects in libzim simply precludes any handling of URL fragments. And I don't see any solution differing in its essence from the mentioned workaround - the best we can do is generate the HTML dynamically inside libzim when accessing the redirect.

veloman-yunkan avatar Jul 10 '25 09:07 veloman-yunkan