web3.storage Give up trying to pin a CID after a given time threshold

If you send a pinning request for CID, but CID doesn't exist / node is offline, our PIN will stay stuck in "queued" status. We should abandon the operation after a given time threshold has passed.

Update

This can be taken care of by the Pinning API in Elastic Provider, when it takes over from Cluster.

Impact

(Infra) Decrease load on Cluster, which translates to a decreased use of resources
(Biz) Reduce chances of an overwhelmed cluster in the near future
(User) If we land this, automatically clean up hanging requests translates to less housekeeping the user would have to do to "clean up" requests.

Acceptance Criteria

[ ] After a given time threshold giveUpThreshold, the Cluster should stop trying to get and pin a given CID, if there are no more recent PinningRequests for the same CID or Uploads
[ ] PinninRequests that were created before giveUpThreshold should report a failed status if there are no more recent PinningRequests.
[ ] PinninRequests that were created after giveUpThreshold should report their effective status, based on cluster state.
[ ] Ability to clean existing Pinning Requests.

Notes.

What happens if there's a pinning request for CID_A, which is "expired" but a chunked upload for the same CID_A exists. In this case, we might have 2 scenarios:
- A chunk upload is in progress
- A chunk upload is failed in practice
consider removing nonexistent CIDs from the content table.
The suggested threshold for giveUpThreshold, is 1 day. Could be even smaller, let's parametrise it for easy updating.
At the moment cluster could report failed transient states, I wonder if those shouldn't be reflected to psa statuses? We should consider never sending a failed status until threshold is reached.

Dec 17 '21 11:12 mbommerez

To be discussed with @alanshaw

Jan 28 '22 10:01 mbommerez

Discussed with @alanshaw @flea89 @francois-potato.

All things that cannot be pinned, will be added to a separate queue that keeps growing. In the meantime cluster will keep trying to pin it. This is not an immediate concern but in the future cluster might fall over if the queue grows too much.

We need to find a way for these CIDs to be dropped from cluster.

We need to define the threshold (i.e. after how long, not how many times tried). We also need to find a way to surface this information to the user - a sort of perma-failed status.

Apr 29 '22 10:04 mbommerez

web3.storage web3.storage copied to clipboard

Give up trying to pin a CID after a given time threshold

Update

Impact

Acceptance Criteria

Notes.

web3.storage
web3.storage copied to clipboard