ipfs-cluster
ipfs-cluster copied to clipboard
IPFS being offline causes cascades of errors and potential memory usage bubble
Our call to "ipfs pin ls" streams pins which we collect in a map.
If the request "dies" half-way (because IPFS dies), we end up with a map that does not have all the things it should and an error in the logs.
If this happens during a regular RecoverAll() check, the code will potentially think that a huge amount of IPFS pins are missing. This will result in all those items to be queued for pinning (so they go into memory).
While we are doing that, we will be attempting to pin things too, opening requests to IPFS obviously immediately fail, causing huge load and errors, while the queue is getting filled and memory ballooning.
Cluster should be aware if IPFS is not reachable (connection refused) and introduce some sort of delay / retry logic so that it is not possible to hammer a dead-node like now. Probably the ipfsconnector is the best place to have this logic, as it is the place that makes requests to IPFS and has common methods for that.
The problem with too many things being queued due to missing ipfs-pins entries in the pintracker is separate and involves surfacing and acting on pin-ls streaming errors, so that we abort StatusAll calls when they happen.