Bugfix/cldsrv 514 handling of metadata storage errors
A first set of fixes to reduce the occurence of orphans creation, when the fix is "easy", that is, we can delete the orphan in the same API.
Note: The code to set delete markers is safe, as only metadata is updated. However, when deleting the data (usually, after the metadata), it becomes possible to create orphans in the storage, in this case, we only log it, for rnow, before a more consistent approach.
Hello williamlardier,
My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.
Available options
| name | description | privileged | authored |
|---|---|---|---|
/after_pull_request |
Wait for the given pull request id to be merged before continuing with the current one. | ||
/bypass_author_approval |
Bypass the pull request author's approval | :star: | |
/bypass_build_status |
Bypass the build and test status | :star: | |
/bypass_commit_size |
Bypass the check on the size of the changeset TBA |
:star: | |
/bypass_incompatible_branch |
Bypass the check on the source branch prefix | :star: | |
/bypass_jira_check |
Bypass the Jira issue check | :star: | |
/bypass_peer_approval |
Bypass the pull request peers' approval | :star: | |
/bypass_leader_approval |
Bypass the pull request leaders' approval | :star: | |
/approve |
Instruct Bert-E that the author has approved the pull request. | :writing_hand: | |
/create_pull_requests |
Allow the creation of integration pull requests. | ||
/create_integration_branches |
Allow the creation of integration branches. | ||
/no_octopus |
Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead | ||
/unanimity |
Change review acceptance criteria from one reviewer at least to all reviewers |
||
/wait |
Instruct Bert-E not to run until further notice. |
Available commands
| name | description | privileged |
|---|---|---|
/help |
Print Bert-E's manual in the pull request. | |
/status |
Print Bert-E's current status in the pull request TBA |
|
/clear |
Remove all comments from Bert-E from the history TBA |
|
/retry |
Re-start a fresh build TBA |
|
/build |
Re-start a fresh build TBA |
|
/force_reset |
Delete integration branches & pull requests, and restart merge process from the beginning. | |
/reset |
Try to remove integration branches unless there are commits on them which do not appear on the source branch. |
Status report is not available.
Request integration branches
Waiting for integration branch creation to be requested by the user.
To request integration branches, please comment on this pull request with the following command:
/create_integration_branches
Alternatively, the /approve and /create_pull_requests commands will automatically
create the integration branches.
Supposing the await for the 'more consistent' approach is why we don't see tests for the cases where we can't delete? Do you know what this approach will be?
If I understand well your question @KazToozs , you are referring to the remaining oprhans we crate, or the cases where we only log it. My suggested approach, for Zenko, is to rely on transactions to perform atomic operations on the database. This way, we can easily avoid partial metadata updates that lead to either oprhans on the storage side, or in the metadata DB. This however requires more design work beforehand.
Another solution, to complete it, because we can still have orphans with atomic updates (because we delete data from 2 different storage backends), it to persist the list of known keys that are (maybe) orphans, and have an internal job (or manual operation) taking care of them, if needed. This also requires some design.
I haven't looked at the details of this PR but would like to mention that for S3C, it is a deliberate choice not to cleanup orphans. Indeed it's possible to have a dangling metadata entry because we are not sure if the metadata write actually failed for real, when we get an error. The dangling entry can cause serious issues to applications or suspicion of data loss because we cannot always know what is the history of this entry and if it has had an error.
Maybe a middle ground to tackle this issue better could be to defer the orphan cleanup after some time long enough to let the Metadata layer settle all its pending requests or timeout, then re-check what is the metadata state before doing the orphan deletion.
Also, when we have a good solution in mind, we should definitely consider applying it on 7.x branches (but we could do a later backport after more testing if we are concerned about the risk of regression on S3C).
@jonathan-gramain , do you mean we can have errors returned by metadata in the S3C case, and this approach for MongoDB is not safe for the 7.x branches? Or do you mean, even with MongoDB, we should not "trust" the errors returned by the driver, as it might report an error, while the metadata was actually written? Note that here, we tackle orphans in the storage, not in metadata.
Having something running after a while seem unsafe: we can, between this timelapse, have other operations on this object's metadata and have it changed/deleted in a way that would not solve our issue here. E.g.: we really fail to write the metadata at first, but the data A is written in the storage. Then the client retries and succeed, metadata is written and data B is stored. Then the cleanup job detects that the metadata is here, and does nothing. At the end, we have an orphan.
Anyway, putting this work on hold as we will need a unified solution for both branches (IMHO, in our APIs, as we should rely on their return codes perfectly).