grist-core
grist-core copied to clipboard
file versions on bucket
Hi,
It looks like there is an issue with versions on meta.json on the bucket. On a test cluster I have :
# m3vewxmigDFEr4fjtbotkn.grist
mcli ls local-grist/grist/docs/m3vewxmigDFEr4fjtbotkn.grist --versions
[2024-02-15 17:55:37 CET] 200KiB STANDARD 7f39bcfd-132b-4877-9a91-fa6562cfca2d v5 PUT m3vewxmigDFEr4fjtbotkn.grist
[2024-02-15 17:54:22 CET] 192KiB STANDARD 73e9f85d-479f-40b6-840e-1b83fb5c96b1 v4 PUT m3vewxmigDFEr4fjtbotkn.grist
[2024-02-15 16:53:50 CET] 188KiB STANDARD 4e478f07-8543-4b3e-a237-380acfd8f85e v3 PUT m3vewxmigDFEr4fjtbotkn.grist
[2024-02-15 16:52:55 CET] 184KiB STANDARD 179667ff-1478-4be3-87be-262188fd1ecb v2 PUT m3vewxmigDFEr4fjtbotkn.grist
[2024-02-15 16:48:56 CET] 180KiB STANDARD 808aecab-806c-40d3-9fd2-c21dd1352f72 v1 PUT m3vewxmigDFEr4fjtbotkn.grist
# assets/unversioned/m3vewxmigDFEr4fjtbotkn/meta.json
mcli ls local-grist/grist/docs/assets/unversioned/m3vewxmigDFEr4fjtbotkn/meta.json --versions
[2024-02-15 17:55:37 CET] 1.2KiB STANDARD 650f2ede-f0df-4346-b2e4-572b9388d575 v12 PUT meta.json
[2024-02-15 17:55:37 CET] 1.5KiB STANDARD 117c65b3-b539-45ef-819f-37ccea0f497a v11 PUT meta.json
[2024-02-15 17:54:22 CET] 1.2KiB STANDARD 589715ef-0189-49d1-9ba2-d1dac1979097 v10 PUT meta.json
[2024-02-15 17:54:22 CET] 1.5KiB STANDARD 738bf8cc-aaf2-4e7d-b1d8-ca0939c690a0 v9 PUT meta.json
[2024-02-15 16:53:56 CET] 1.2KiB STANDARD e2d6faa0-1ff6-40aa-86a1-d7582c4b4178 v8 PUT meta.json
[2024-02-15 16:53:50 CET] 1.5KiB STANDARD e87d0400-1fe6-42d9-a41f-426c7ef4b863 v7 PUT meta.json
[2024-02-15 16:52:55 CET] 1.2KiB STANDARD 29529bdf-5d9a-4651-b7b3-46f22542fc72 v6 PUT meta.json
[2024-02-15 16:48:56 CET] 1011B STANDARD 3a4cebcf-056c-4bec-a0ed-8f9e6964bc81 v5 PUT meta.json
[2024-02-15 16:42:13 CET] 758B STANDARD 27519ede-666f-4a4d-9aa7-7beca68e63ae v4 PUT meta.json
[2024-02-15 16:40:37 CET] 506B STANDARD cce96134-7efe-4835-8767-341bd2565b63 v3 PUT meta.json
[2024-02-15 16:24:25 CET] 254B STANDARD ee919481-28a1-46aa-a1a2-e70513fe0741 v2 PUT meta.json
[2024-02-15 16:24:06 CET] 2B STANDARD 8a38f30f-2d41-40db-8db6-7847084d177f v1 PUT meta.json
Versions of meta.json seem to never be cleaned. This maybe a problem in the future. On our production environment some meta.json have more than 1500 versions.
mcli ls REDACTED/docs/pSj56ciNdcojWWXuram5En.grist --versions | wc -l
69
mcli ls REDACTED/docs/assets/unversioned/pSj56ciNdcojWWXuram5En/meta.json --versions | wc -l
1859
Ah, we use lifecycle rules on the bucket to handle that, rather than doing it in code. Is that an option for you? Sorry for not having this gotcha in the documentation...
For now NoncurrentVersionExpiration is not supported by Scaleway's S3 compatible object storage, but should be implemented soon.
Hmm, that's too bad. There is code for pruning snapshots at: https://github.com/gristlabs/grist-core/blob/7a0e0a9707f63e0807dfeb156b7fd54939d7fc7d/app/server/lib/DocSnapshots.ts#L20-L23 This is needed since there the logic about which snapshots to retain is very custom. It could be adapted for meta.json I suppose, although it would be very over-engineered for this task. Another option would be a change to allow storing the meta.json stuff on a separate bucket, which could then be separately configured. There is no technical reason it needs to be in the same same bucket.
But if Scaleway's implementation is coming soon enough, maybe waiting works :)
@paulfitz Maybe a silly question: would it make sense to only keep one (or just some few) versions of meta.json
files?
This way, anyone self-hosting Grist wouldn't have to let the user worry about the retention policies. What do you think?
Not silly @fflorent, keeping just the latest version of meta.json
is fine, there's no reason for it to be versioned. It can be rebuilt, it is a cache. The only question is how to do that. One way would be to change the code to stick this data in an unversioned bucket. Another would be to add code to delete old versions. Another (the way we did it) is with a retention policy to say we only want the latest version of such files.
The existence of meta.json
is actually kind of weird. It is there primarily to support a feature that was never built: to allow snapshots to have human-specified labels. The S3 protocol gives a way to store such labels but not to access them as efficiently as one might like.
(we used a retention policy since at the time this feature was built, we were focused on our own SaaS, and retention policies were available and easy)
It seems like the MinIO JS client supports getting the lifecycle bucket policy.
Regarding Scaleway and its lack of support for NoncurrentVersionExpiration, I agree that waiting is what is simpler to do.
Regarding MinIO and other S3 storage provider, I wonder if that could not be an enhancement of the diagnosis page: https://github.com/gristlabs/grist-core/pull/850
Once it is merged, we (the ANCT and/or DINUM) can take a look at making this if that makes sense.
We had an issue with a document that could not be open anymore. For the record, here is an extract of the logs:
ext meta upload: <DOCID> failure to send, error A conflicting conditional operation is currently in progress against this resource. Please try again.
(Note the ext meta upload
prefix)
Our workaround was to remove the old versions of meta JSON using mc (remove --dry-run
to be effective and replace ALIAS, BUCKET and DOCID with the relevant values):
$ mc rm --dry-run --force --versions --older-than 3d \
<ALIAS>/<BUCKET>/docs/assets/unversioned/<DOC_ID>/meta.json