helia
helia copied to clipboard
CIDs Not Reprovided Automatically in Helia After Initial Provide Call
Opening this issue following a conversation with @2color
Description:
We're encountering an issue with our Helia-based IPFS implementation in Distributed Press (https://github.com/hyphacoop/api.distributed.press/pull/101) where newly published CIDs are accessible via gateways, but older CIDs (after ~48 hours) become undiscoverable, resulting in errors like Could not find the multihash in DHT or IPNI. In contrast, we understand Kubo automatically reprovides CIDs to maintain discoverability. We're unsure if Helia's reproviding is manual or if we're missing a configuration to enable automatic reproviding similar to Kubo.
Key Question:
We expected the default 24-hour reproviding logic to handle this automatically.
- When we restart our staging server, it republishes the CIDs available at https://dp.chanterelle.xyz/v1/sites.
- Our server restarts every 6 days, so content goes undiscoverable in the interim.
- But the default reprovide threshold is 24 hours — so shouldn't Helia reprovide automatically?
Main Question:
Is reproviding in Helia manual — requiring explicit provide(cid) calls — or does kadDHT automatically reprovide CIDs once initially provided?
If it's automatic, are we missing configuration to make it work reliably beyond 48 hours?
Any tips on debugging or resolving this would be greatly appreciated!
IPFS check:
-
For latest published: https://check.ipfs.network/?cid=bafybeihsxenza22qg6dxqcxx26i7vdaj24ckx6ruwtykf2cj6a3clescsy&multiaddr=%2Fp2p%2F12D3KooWJPjj39DFCLhw81XooR6bgLukRmXLxJtvNY9Muwb3cfws&ipniIndexer=https%3A%2F%2Fcid.contact&timeoutSeconds=60&httpRetrieval=on EDIT: This is now Garbage Collected;
Could not find the multihash in DHT or IPNI -
For older: https://check.ipfs.network/?cid=bafybeih5t7kfe5ijg4sxzmdmzmgn32nlxxxv25tjfnh6r4t74jiolgl2cy&multiaddr=%2Fp2p%2F12D3KooWJPjj39DFCLhw81XooR6bgLukRmXLxJtvNY9Muwb3cfws&ipniIndexer=https%3A%2F%2Fcid.contact&timeoutSeconds=60&httpRetrieval=on
Code:
- Helia integration config: https://github.com/hyphacoop/api.distributed.press/blob/helia/protocols/ipfs.ts
- Reprovide config: https://github.com/hyphacoop/api.distributed.press/blob/9f1c6af522314d93ee4bd3d87934ed189d0516b3/protocols/ipfs.ts#L191
- We call
helia.libp2p.contentRouting.provide(cid)with retries: https://github.com/hyphacoop/api.distributed.press/blob/9f1c6af522314d93ee4bd3d87934ed189d0516b3/protocols/ipfs.ts#L325
I think I've found the root cause for this, which I m observed during debugging — somewhat coincidentally.
The FsDatastore does not yield any entries when the query method is called with a prefix option.
For example, the following script
import { unixfs } from '@helia/unixfs'
import { loadOrCreateSelfKey } from '@libp2p/config'
import { kadDHT } from '@libp2p/kad-dht'
import { enable } from '@libp2p/logger'
import { FsBlockstore } from 'blockstore-fs'
import { FsDatastore } from 'datastore-fs'
import { createHelia, libp2pDefaults } from 'helia'
const datastore = new FsDatastore('./datastore')
const blockstore = new FsBlockstore('./blockstore')
const libp2pConf = libp2pDefaults()
libp2pConf.services.dht = kadDHT({
reprovide: {
interval: 1000 * 60 * 1 // reprovide every 1 minutes
}
})
libp2pConf.privateKey = await loadOrCreateSelfKey(datastore)
enable('libp2p:kad-dht:reprovider*,libp2p:kad-dht:reprovider*:trace,libp2p:kad-dht*,-libp2p:kad-dht:query*,-libp2p:kad-dht:trace,-libp2p:kad-dht:routing-table*,-libp2p:kad-dht:topology-listener*,-libp2p:kad-dht:network*,-libp2p:kad-dht:peer-routing*') // less noisy
const helia = await createHelia({
blockstore,
datastore,
libp2p: libp2pConf
})
console.log('Created Helia node with PeerID:', helia.libp2p.peerId.toString())
// create a filesystem on top of Helia, in this case it's UnixFS
const fs = unixfs(helia)
const encoder = new TextEncoder()
const cid = await fs.addFile({
content: encoder.encode('Hello World 🗺️🌎🌍🌏 402!'),
path: './hello-world.txt'
})
setInterval(async () => {
console.log('searching provider records in the datastore...')
for await (const entry of helia.datastore.query({
prefix: '/dht/provider'
})) {
console.log('--- Datastore entry:', entry.key.toString(), entry.value.toString())
}
}, 1000 * 20) // log every 10 seconds
// Provide the block to the DHT so that other nodes can find and retrieve it
await helia.routing.provide(cid)
console.log('CID provided to the DHT:', cid.toString())
On the other hand, if Helia is instantiated with the default Datastore, i.e.
const helia = await createHelia({
blockstore,
libp2p: libp2pConf
})
...
The following call to the query function correctly yields the provider record for the CID:
for await (const entry of helia.datastore.query({
prefix: '/dht/provider'
})) {
console.log('--- Datastore entry:', entry.key.toString())
}
I believe it's due to the way that this prefix is handled by the different datastore implementations.
FsDatastore inherits from the base: https://github.com/ipfs/js-stores/blob/main/packages/datastore-core/src/base.ts#L86
which calls ._all which converts the prefix to a glob pattern passed to 'it-glob'
The datastore prefix is defined here (as part of the reprovider class) https://github.com/libp2p/js-libp2p/blob/main/packages/kad-dht/src/reprovider.ts/#L80C33-L80C33
Next steps
- figure out what is the expected prefix string type and whether it's ok for store implementations to deviate from this convention.
- Potentially rectify the inconsistency in the https://github.com/ipfs/js-stores/ repo
Interesting. I would definitely switch to datastore-level if that is an option.
@achingbrain thanks. Any thoughts on how we should address the prefix discrepancy between implementations?
A test should be added to the datastore interface test suite that exposes the bug along with a patch to fix it.
@akhileshthite Please try with the datastore-level and blockstore-fs (for consistency)
I can confirm that the following works:
import { unixfs } from '@helia/unixfs'
import { inspectorMetrics } from '@ipshipyard/libp2p-inspector-metrics'
import { loadOrCreateSelfKey } from '@libp2p/config'
import { kadDHT } from '@libp2p/kad-dht'
import { enable } from '@libp2p/logger'
import { FsBlockstore } from 'blockstore-fs'
import { LevelDatastore } from 'datastore-level'
import { createHelia, libp2pDefaults } from 'helia'
const datastore = new LevelDatastore('./datastore')
const blockstore = new FsBlockstore('./blockstore')
const libp2pConf = libp2pDefaults()
libp2pConf.metrics = inspectorMetrics()
libp2pConf.services.dht = kadDHT({
reprovide: {
interval: 1000 * 60 * 1 // reprovide every 1 minutes
}
})
libp2pConf.privateKey = await loadOrCreateSelfKey(datastore)
enable('libp2p:kad-dht:reprovider*,libp2p:kad-dht:reprovider*:trace,libp2p:kad-dht*,-libp2p:kad-dht:query*,-libp2p:kad-dht:trace,-libp2p:kad-dht:routing-table*,-libp2p:kad-dht:topology-listener*,-libp2p:kad-dht:network*,-libp2p:kad-dht:peer-routing*') // less noisy
const helia = await createHelia({
blockstore,
datastore,
libp2p: libp2pConf
})
console.log('Created Helia node with PeerID:', helia.libp2p.peerId.toString())
// create a filesystem on top of Helia, in this case it's UnixFS
const fs = unixfs(helia)
const encoder = new TextEncoder()
const cid = await fs.addFile({
content: encoder.encode('Hello World 🗺️🌎🌍🌏 402!'),
path: './hello-world.txt'
})
setInterval(async () => {
console.log('listing provider records in the datastore...')
for await (const entry of helia.datastore.query({
prefix: '/dht/provider'
})) {
console.log('--- Datastore entry:', entry.key.toString(), entry.value.toString())
}
}, 1000 * 1)
// Provide the block to the DHT so that other nodes can find and retrieve it
await helia.routing.provide(cid)
console.log('CID provided to the DHT:', cid.toString())
This is looking good 👀
CC @tripledoublev
Interesting. I would definitely switch to datastore-level if that is an option.
Could you elaborate on why this is recommended? 👀