openstorage icon indicating copy to clipboard operation
openstorage copied to clipboard

PWX-23290: handle spec updates with stale config

Open sulakshm opened this issue 2 years ago • 4 comments

Signed-off-by: Lakshmi Narasimhan Sundararajan [email protected]

What this PR does / why we need it:

When volume spec updates are requested, there is a spec reconciliation based on the current state of the volume. This is done to validate policy settings. For some volume update operations the spec gets updated only upon action from px-storage and may take time to complete action. Ex. HA Update on a volume needs action from px-storage before the HA level change gets reflected in spec. So, if there is a subsequent volume update, it is possible for the reconciliation action part of update to pick stale config, and the final action may be inconsistent after.

To avoid this, whenever volume update is applied, only pass the relevant section of the spec that needs update, mask rest of the specs that have not been updated.

This window of stale config is open only for defered spec updated and long running actions to complete the update operations. Boolean based state are mostly instantaneous updates to the spec and does not seem to have this problem.

Which issue(s) this PR fixes (optional) Closes # PWX-23290 https://portworx.atlassian.net/browse/PWX-23290

Special notes for your reviewer: The changes have been tested in 2.10.0-ea branch where the issue was reported.

sulakshm avatar Apr 05 '22 08:04 sulakshm

This is definitely better than what we have now. What about cases like back to back ha updates where the 2nd update is issued right before the 1st one completes. We would still have a problem there rt?

prabirpaul avatar Apr 07 '22 18:04 prabirpaul

This is definitely better than what we have now. What about cases like back to back ha updates where the 2nd update is issued right before the 1st one completes. We would still have a problem there rt?

Ha Operations do update the spec to hand off request from px to px-storage, but it is a different part of the spec. There is a Recovery section that gets exchanged between px and px-storage. So any new Ha Operation always refer to below part of the spec. This portion will always get reflected for the ongoing Ha. So either the new Ha operation is serialized behind or rejected.

  "Recovery": {
    "ID": "437652889672355328",
    "State": 0,
    "AttachedOn": -1,
    "AggregationType": 1,
    "ReplicationSets": [
      {
        "ID": 0,
        "CreateSet": [
          0,
          1
        ],
        "CreateSetPoolIds": [
          0,
          0
        ],
        "ReplicaSetCurr": [
          0,
          1
        ],
        "ReplicaSetNew": null,
        "ReplicaSetNewPoolIds": null,
        "RemoveSet": [],
        "ReAddSet": [],
        "ReAddSetPoolIds": [],
        "ReplicaSetNewSourceNodes": null
      }
    ],
    "AbortOnError": false,
    "SnapRetry": false
  },

sulakshm avatar Apr 08 '22 05:04 sulakshm

This PR is stale because it has been in review for 3 days with no activity.

github-actions[bot] avatar Apr 12 '22 02:04 github-actions[bot]

This PR is stale because it has been open for 180 days with no activity. Update this PR or it will be automatically closed in 6 months.

github-actions[bot] avatar Jul 21 '22 02:07 github-actions[bot]