overview
overview copied to clipboard
Integrate Kiwix with zsync
PROBLEM Many .zim files, eg for Wikipedia, are huge. If you wish to update your wikipedia zim, you must re-download everything. This takes time, uses lots of bandwidth, means that new torrents need to be started frequently, and fragements people seeding torrents across versions.
PROPOSAL Wikipedia releases an annual zim, eg 2019. That is torrented. Then, there are say four updates per year, which arrive Spring Summmer, Autumn Winter. These updates would include the new edits and added material. Something like zsync would be used to update the downloaded torrent to the more recent version.
This would allow an accumulation throughtout the year of people seeding the same torrent, the 2019 one, and at the same time allow people to be up to date without having to re-download the entire wikipedia zim.
@yeehi Interesting, I have made a test and it save ~ 60% of the bandwidth!
$ zsyncmake -u 'http://mirror.download.kiwix.org/zim/wikipedia/wikipedia_en_medicine_novid_2018-10.zim' wikipedia_en_medicine_novid_2018-10.zim
$ mv wikipedia_en_medicine_novid_2018-10.zim wikipedia_en_medicine_novid_2018-10.zim.old
$ zsync -i wikipedia_en_medicine_novid_2018-09.zim wikipedia_en_medicine_novid_2018-10.zim.zsync
reading seed file wikipedia_en_medicine_novid_2018-09.zimead wikipedia_en_medicine_novid_2018-09.zim. Target 57.6% complete.
downloading from http://mirror.download.kiwix.org/zim/wikipedia/wikipedia_en_medicine_novid_2018-10.zim:
#################### 100.0% 10741.9 kBps DONE
verifying download...checksum matches OK
used 706150400 local, fetched 519752649
$ ls -la *
-rw-r--r-- 1 kelson kelson 1208191347 Dez 15 16:05 wikipedia_en_medicine_novid_2018-09.zim
-rw------- 1 kelson kelson 1225768642 Dez 15 12:05 wikipedia_en_medicine_novid_2018-10.zim
-rw------- 1 kelson kelson 1225768642 Dez 15 14:05 wikipedia_en_medicine_novid_2018-10.zim.old
-rw-r--r-- 1 kelson kelson 2394378 Dez 15 16:13 wikipedia_en_medicine_novid_2018-10.zim.zsync
@mgautierfr @rgaudin If you are curious, please give a try too, might be an idea to deal with the problem of incremental update. That said if we want to use the advantages of aria2c, some kind of work to integrate both would be necessary.
I have opened a feature request on aria2c side https://github.com/aria2/aria2/issues/1320.
@yeehi I have detected that Mirrorbrain supports zsync, so we will try to activate it on the download.kiwix.org end to see if it works fine. See https://github.com/kiwix/maintenance/issues/37
@kelson42 Thank you very much for using your skills to assist! It is great that you were able to check zsync with the wiki already.
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.
Maybe we should consider IPFS as well which proposes a similar functionnality like zsync. See https://github.com/ipfs/distributed-wikipedia-mirror/issues/71
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.
@kelson42 any update on this? Btw, I see that there's now a rewrite of zsync: https://github.com/AppImage/zsync2
No update, we look more. In the direction of IPFS for the moment.
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.