zimfarm icon indicating copy to clipboard operation
zimfarm copied to clipboard

Non-public ZIM files should better be uploaded to S3

Open kelson42 opened this issue 4 years ago • 6 comments

Most of what is currently uploaded to http://download.kiwix.org/zim/.hidden/ should better be uploaded to S3 because:

  • S3 is slightly more flexible than our classic hosting
  • These files should not be mirrored
  • These files (or there path) should better not be public
  • We could easily handle the deletion of older versions

kelson42 avatar Dec 11 '21 11:12 kelson42

Most of what is currently uploaded to http://download.kiwix.org/zim/.hidden/ should better be uploaded to S3 because:

* S3 is slightly more flexible than our classic hosting

What kind of flexibility? It sure scales transparently in terms of storage space but that's it. The rest is not flexible. Operations must be done using a custom authentication and API.

* These files should not be mirrored

Agrees. Could be stored outside /zim though and those won't be mirrored.

* These files (or there path) should better not be public

Even on S3, we won't make it private. It just wouldn't be browsable. My understanding is that we don't want to advertise those ZIM but those are not private per se.

* We could easily handle the deletion of older versions

How so ? S3 would allow us to set an arbitrary expiration date which might not be flexible enough actually.


How would dev.library.kiwix.org work with the ZIMs on S3 ?

rgaudin avatar Dec 11 '21 16:12 rgaudin

The files are not private, but it is easy to hide the URL in the Zimfarm, it is not easy to hide a file in download.kiwix.org. For the upload I would make it in a way that we keep track of the previous upload and each time we upload a new version we put a timeout of X of the older version so it will autodestroy. All of this can be easily and properly scripted thx to the S3 API, the kind of thing which are not that straight forward & robust & secure with a remote filesystem.

kelson42 avatar Dec 11 '21 16:12 kelson42

The files are not private, but it is easy to hide the URL in the Zimfarm, it is not easy to hide a file in download.kiwix.org.

As easy as on S3…

For the upload I would make it in a way that we keep track of the previous upload and each time we upload a new version we put a timeout of X of the older version so it will autodestroy. All of this can be easily and properly scripted thx to the S3 API, the kind of thing which are not that straight forward & robust & secure with a remote filesystem.

Not sure to agree here. If we keep the Zimfarm a ZIM building farm that just uploads and then it's gone (from ZF POV), then no, it's not easier to do it with S3, which is a remote FS that must be queried using the API vs the ability to run local scripts on the file server.

But from your comment, I understand you want the ZIMfarm to be responsible for the ZIM's lifecycle: keep record of when a ZIM was uploaded, where and delete it (set expiry) when it has been replaced. That's probably more robust in the long run but it's a concept switch that we should discuss and think through with all tools in mind ; ie clearly define the responsibilities of Zimfarm, download servers and CMS.

rgaudin avatar Dec 13 '21 07:12 rgaudin

Just to say I need to access the Custom ZIMs to make WikiMed. Is this issue just about using a more flexible server, or is it about making these ZIMs inaccessible to the public? I think those are two issues that might have different answers and rationales...

Jaifroid avatar Dec 19 '21 19:12 Jaifroid

@Jaifroid No worry. Either you won't be impacted or you will be kept informed about the URL to find the ZIM file.

kelson42 avatar Dec 19 '21 20:12 kelson42

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

stale[bot] avatar Mar 02 '22 09:03 stale[bot]