firefly
firefly copied to clipboard
Failed to download batch from IPFS
During a recent performance run, the job failed to start because the orgs were not registered properly.
Node 1 shows this upload:
{"log":"[2022-08-17T21:44:05.067Z] INFO IPFS published QmZV1813npCEb8USvUJXHmnLNbtQgjLgZA8TPF83P1pJ68 Size=1415 d=pinned_broadcast ns=default opcache=juVW47Fq p=did:firefly:org/org_1| pid=1 role=batchmgr\n","stream":"stderr","time":"2022-08-17T21:44:05.068084982Z"}
{"log":"[2022-08-17T21:44:05.067Z] INFO Published batch 'f4b9cba6-e30b-4cf9-a143-e9ac425dc2d1' to shared storage: 'QmZV1813npCEb8USvUJXHmnLNbtQgjLgZA8TPF83P1pJ68' d=pinned_broadcast ns=default opcache=juVW47Fq p=did:firefly:org/org_1| pid=1 role=batchmgr\n","stream":"stderr","time":"2022-08-17T21:44:05.068104641Z"}
Node 0 is repeatedly unable to download:
{"log":"[2022-08-17T21:44:07.861Z] DEBUG ==\u003e GET http://ipfs_0:8080/ipfs/QmZV1813npCEb8USvUJXHmnLNbtQgjLgZA8TPF83P1pJ68 breq=BAGcvc3U pid=1 sharedstorage=ipfs\n","stream":"stderr","time":"2022-08-17T21:44:07.862328371Z"}
{"log":"[2022-08-17T21:44:37.862Z] DEBUG \u003c== GET http://ipfs_0:8080/ipfs/QmZV1813npCEb8USvUJXHmnLNbtQgjLgZA8TPF83P1pJ68 [0] (30001.16ms) breq=BAGcvc3U pid=1 sharedstorage=ipfs\n","stream":"stderr","time":"2022-08-17T21:44:37.862481893Z"}
{"log":"[2022-08-17T21:44:37.862Z] DEBUG ipfs updating operation default:f9ffe35b-28de-446b-a849-177db05d3134 status=Pending error=FF10376: Error downloading data with reference 'QmZV1813npCEb8USvUJXHmnLNbtQgjLgZA8TPF83P1pJ68' from shared storage: FF10136: Error from IPFS: : Get \"http://ipfs_0:8080/ipfs/QmZV1813npCEb8USvUJXHmnLNbtQgjLgZA8TPF83P1pJ68\": context deadline exceeded (Client.Timeout exceeded while awaiting headers) ns=default pid=1\n","stream":"stderr","time":"2022-08-17T21:44:37.862524061Z"}
{"log":"[2022-08-17T21:44:37.862Z] ERROR Download operation sharedstorage_download_batch/f9ffe35b-28de-446b-a849-177db05d3134 attempt=1/100 failed: FF10376: Error downloading data with reference 'QmZV1813npCEb8USvUJXHmnLNbtQgjLgZA8TPF83P1pJ68' from shared storage: FF10136: Error from IPFS: : Get \"http://ipfs_0:8080/ipfs/QmZV1813npCEb8USvUJXHmnLNbtQgjLgZA8TPF83P1pJ68\": context deadline exceeded (Client.Timeout exceeded while awaiting headers) downloadworker=dw_007 ns=default pid=1\n","stream":"stderr","time":"2022-08-17T21:44:37.862698724Z"}
log_firefly_core_0.log.gz log_firefly_core_1.log.gz
Unfortunately did not capture IPFS logs. However, I'm fairly certain IPFS was up and not logging any obvious anomalies.
I've also seen this locally at least once, so it wasn't a totally isolated incident.
So from the surface of the issue, the IPFS network seems like it's not healthy.
Each time a download request is made against Node 0, it should reach out to its peers to find the data. And Node 1 should have knowledge of that data in its DAG.