distributed-wikipedia-mirror
distributed-wikipedia-mirror copied to clipboard
Future: PWA and reading ZIMs directly from IPFS
Been a while, people may wonder what is next for this project, so here is an update. Thank you for being patient. Feel free to reach out to me if you are interested in helping with any of this.
After years, we are now in a position where PWA reader of ZIM archives (kiwix-js) is able to read ZIMs from IPFS in a way that operates without dependency on centralized infrastructure:
New IPFS tools and protocols
- trustless gateways enabled modern verifiable block/CAR responses on gateways allow for using public gateways as HTTP mirrors
- if block-by-block is not efficient enough, we now have IPIP-402 for partial CARs with blocks only for specific byte-ranges
- Kubo 0.23 shipped with experimental
GET /routing/v1server for delegated routing lookups- client can opportunistically check if gateway exposes spec-compliant HTTP endpoint and use it before falling back to expensive DHT walk
- this also allows for discovering more HTTP gateways (by
/httpsand/http/tlsmultiaddrs)
- libp2p introduced
/webtransportwhich allows for peer to peer connectivity in web browser without mixed-content warnings (https://connectivity.libp2p.io/#webtransport)/webtransportis enabled by defaullt in Kubo which is ~>80% of the public swarm- js-ipfs got replaced by Helia, which supports delegated routing and bitswap over
/webtransport
- kiwix-js has been improving all the time ❤️ and seems to be a fully functional PWA now
- only downside it has is that it requires user to prefetch entire ZIM before use
- adding IPFS support enables users to access Wikipedia without having to fetch the entire thing
Future of this Distributed Wikimedia Mirror
All languages available.
With search.
With regular web browser.
Focus on ZIMs
With these building blocks, we can start working towards reimagining this project to be focused on putting ZIMs on IPFS and ensuring they are pinned in multiple places.
We can put all ZIMs for all languages on IPFS, and these ZIMs are not onl yuseful for this project, but also acts as additional mirrors for https://download.kiwix.org/zim/wikipedia/
Leverage kiwix-js
The browser would still be enough for accessing Wikipedia, but we no longer need to unpack ZIMs and modify HTML/JS. All the operational cost here is gone anc can be contributed elsewhere.
An instance of kiwix-js would load specific ZIM by its CID.
Focus on censorship-avoidance and resiliency
Resiliency can be facilitated by using the best IPFS provider available and use:
- local gateway when present (
http://127.0.0.1:8080/ipfs/cidIPFS Desktop, Kubo, or:48080Brave) - fetch blocks and CARs from hardcoded list of public gateways as a fallback
- leverage /routing/v1 for finding additional providers and gateways
- leverage local IPFS node in JS running in-memory as the ultimate fallback
- p2p retrieval and routing when all gateways are censored (webrtc signaling, relays)
Action items to make this future real:
- update zims to ipfs
- create webapp with kiwix js that pulls these zims from ipfs (via byte ranges) instead of asking for file permission to read from disk
- make sure pathing is adapted appropriately.. fix other cosmetics
Notes:
- start with language other than EN since EN is so large.