zimit icon indicating copy to clipboard operation
zimit copied to clipboard

ZIMit 2.0

Open kelson42 opened this issue 2 years ago • 5 comments

Since it launches end of 2021, Zimit has proven to be an interesting tool to build efficiently portable/versatil offline version of "random" Web sites. But what we have is only a version 1.0 and it still suffers of many weaknesses.

The most important weakness is that it relies on ServiceWorkers. Because of this we:

  • have to deal with all the HTTPS/Certificate stuff with Kiwix Hotspot
  • have to modify all our readers and this is almost impossible to achieve

We have worked hard to improve the situation the last two years but this has proven to be a really serious and challenging issue. Not only for us as software publisher, but as well for users which were facing a lot of trouble of dealing with this kind of ZIM.

Fortunately, after studying in detail how the URL rewriting works (this is what all of this is about), we have achieved to make a POC version of ZIMit 2.0 which provides the same level of feature but without using ServiceWorker. In a nutshell this POC:

  • Stores rewritten HTML/JS/CSS in ZIM file (URL rewritting using code of Pywb)
  • Load Wombat (no SW needed) to do URL rewriting live in Web Browser
  • Modify libkiwix to allow a few things like for example data-driven fuzzy URL matching

Here a screencast of the POC (wiht local ZIM file of kiwix.org)

We should now schedule the ZIMit 2.0 project so we can release it before end of 2023.

kelson42 avatar May 26 '23 13:05 kelson42

Some important information for those who've been struggling with Service Worker and want some details:

  • Role of SW is to replace a Server Backend. In the context of WR's Replayweb.page, serverless is a prerequisite.
  • In ZIM/Kiwix context, we already have a ZIM reader that can serve as a backend.
  • A very important constraint we assigned ourselves when building zimit was that we wanted to create regular ZIM files. We were not creating something else and thus shouldn't have to adapt our readers to zimit-made ZIMs.
  • We still had to adapt to SW but figured “it's standard web technology”.
  • 3y later, SW requirement proved to be too much of a pain, due to it requiring a Secure Context.
  • This new approach indeed works as follows:
    • We still store WARC Headers as individual ZIM entries
    • We still store WARC Payload as individual ZIM entries but:
      • Those are not raw Payload from crawler: HTML, CSS and JS entries go through pywb's Rewriter first
      • We also insert wombat-init-variables into every HTML entries
    • we don't include wabac.js anymore so we don't register a SW nor have a UI (iframe) nor do we manage missing entries (404)
    • Wombat is included directly (was coming with wabac.js) and has the same role: rewriting JS-emmited events
    • libkiwix (kiwix-serve or other readers) test the fuzzy rules on unfound requests.
    • We want to store the fuzzy rules inside the ZIM probably and have them consumed by libkiwix. This will be available to all ZIMs.
    • libkiwix will also use the HTTP headers from the WARC Headers when sending the response. This will also be available to any ZIMs ; might help with some duplicate cases.
    • libkiwix will also conditionnaly (maybe via a ZIM private tag) rewrite the response to replace some known variables that are required for wombat to work (like $SERVER_URL = "http://172.16.16.4:8080/my-zim/";)
  • Some of those details may change in the future. Check the POC details:

This move shall (it's only a proof-of-concept) free us from the most pressing issue we face with zimit and allow us to focus on other features. It also makes WARC more important for us and maybe ZIM more attractive to WARC users.

Interested parties are encouraged to subscribe to this ticket to be notified when implementation starts. We'll then probably look for real-world use cases to test the solution against

rgaudin avatar May 26 '23 15:05 rgaudin

Found another scenario that could benefit from part of this solution.

With libkiwix reading/parsing/serving custom HTTP headers for entries, it would be possible for a ZIM reader to return an HTTP redirect for an entry to another URL (in-ZIM or not) with just a single lightweight H/ entry.

Not sure if wanted though.

rgaudin avatar Jun 01 '23 11:06 rgaudin

Hi everybody,

any news on this/ on a date, when the beta will be available? We are waiting with bated breath for the possibility to use Zimit Files in KiWix.

All the best, Benjamin

BenjaminJMueller avatar Jun 26 '23 19:06 BenjaminJMueller

From now, my best guess is 6 to 12 months.

kelson42 avatar Jun 26 '23 20:06 kelson42

See also https://github.com/kiwix/overview/issues/95

Jaifroid avatar Nov 22 '23 11:11 Jaifroid