helia icon indicating copy to clipboard operation
helia copied to clipboard

CIDs Not Reprovided Automatically in Helia After Initial Provide Call

Open akhileshthite opened this issue 4 months ago • 10 comments

Opening this issue following a conversation with @2color

Description:

We're encountering an issue with our Helia-based IPFS implementation in Distributed Press (https://github.com/hyphacoop/api.distributed.press/pull/101) where newly published CIDs are accessible via gateways, but older CIDs (after ~48 hours) become undiscoverable, resulting in errors like Could not find the multihash in DHT or IPNI. In contrast, we understand Kubo automatically reprovides CIDs to maintain discoverability. We're unsure if Helia's reproviding is manual or if we're missing a configuration to enable automatic reproviding similar to Kubo.

Key Question:

We expected the default 24-hour reproviding logic to handle this automatically.

  • When we restart our staging server, it republishes the CIDs available at https://dp.chanterelle.xyz/v1/sites.
  • Our server restarts every 6 days, so content goes undiscoverable in the interim.
  • But the default reprovide threshold is 24 hours — so shouldn't Helia reprovide automatically?

Main Question:

Is reproviding in Helia manual — requiring explicit provide(cid) calls — or does kadDHT automatically reprovide CIDs once initially provided? If it's automatic, are we missing configuration to make it work reliably beyond 48 hours?

Any tips on debugging or resolving this would be greatly appreciated!

IPFS check:

  • For latest published: https://check.ipfs.network/?cid=bafybeihsxenza22qg6dxqcxx26i7vdaj24ckx6ruwtykf2cj6a3clescsy&multiaddr=%2Fp2p%2F12D3KooWJPjj39DFCLhw81XooR6bgLukRmXLxJtvNY9Muwb3cfws&ipniIndexer=https%3A%2F%2Fcid.contact&timeoutSeconds=60&httpRetrieval=on EDIT: This is now Garbage Collected; Could not find the multihash in DHT or IPNI

  • For older: https://check.ipfs.network/?cid=bafybeih5t7kfe5ijg4sxzmdmzmgn32nlxxxv25tjfnh6r4t74jiolgl2cy&multiaddr=%2Fp2p%2F12D3KooWJPjj39DFCLhw81XooR6bgLukRmXLxJtvNY9Muwb3cfws&ipniIndexer=https%3A%2F%2Fcid.contact&timeoutSeconds=60&httpRetrieval=on

Code:

  • Helia integration config: https://github.com/hyphacoop/api.distributed.press/blob/helia/protocols/ipfs.ts
  • Reprovide config: https://github.com/hyphacoop/api.distributed.press/blob/9f1c6af522314d93ee4bd3d87934ed189d0516b3/protocols/ipfs.ts#L191
  • We call helia.libp2p.contentRouting.provide(cid) with retries: https://github.com/hyphacoop/api.distributed.press/blob/9f1c6af522314d93ee4bd3d87934ed189d0516b3/protocols/ipfs.ts#L325

akhileshthite avatar Jul 23 '25 06:07 akhileshthite

Image Image

akhileshthite avatar Jul 23 '25 06:07 akhileshthite

I think I've found the root cause for this, which I m observed during debugging — somewhat coincidentally.

The FsDatastore does not yield any entries when the query method is called with a prefix option.

For example, the following script

import { unixfs } from '@helia/unixfs'
import { loadOrCreateSelfKey } from '@libp2p/config'
import { kadDHT } from '@libp2p/kad-dht'
import { enable } from '@libp2p/logger'
import { FsBlockstore } from 'blockstore-fs'
import { FsDatastore } from 'datastore-fs'
import { createHelia, libp2pDefaults } from 'helia'

const datastore = new FsDatastore('./datastore')
const blockstore = new FsBlockstore('./blockstore')

const libp2pConf = libp2pDefaults()
libp2pConf.services.dht = kadDHT({
  reprovide: {
    interval: 1000 * 60 * 1 // reprovide every 1 minutes
  }
})
libp2pConf.privateKey = await loadOrCreateSelfKey(datastore)

enable('libp2p:kad-dht:reprovider*,libp2p:kad-dht:reprovider*:trace,libp2p:kad-dht*,-libp2p:kad-dht:query*,-libp2p:kad-dht:trace,-libp2p:kad-dht:routing-table*,-libp2p:kad-dht:topology-listener*,-libp2p:kad-dht:network*,-libp2p:kad-dht:peer-routing*') // less noisy

const helia = await createHelia({
  blockstore,
  datastore,
  libp2p: libp2pConf
})
console.log('Created Helia node with PeerID:', helia.libp2p.peerId.toString())

// create a filesystem on top of Helia, in this case it's UnixFS
const fs = unixfs(helia)
const encoder = new TextEncoder()

const cid = await fs.addFile({
  content: encoder.encode('Hello World 🗺️🌎🌍🌏 402!'),
  path: './hello-world.txt'
})

setInterval(async () => {
  console.log('searching provider records in the datastore...')
  for await (const entry of helia.datastore.query({
    prefix: '/dht/provider'
  })) {
    console.log('--- Datastore entry:', entry.key.toString(), entry.value.toString())
  }
}, 1000 * 20) // log every 10 seconds

// Provide the block to the DHT so that other nodes can find and retrieve it
await helia.routing.provide(cid)
console.log('CID provided to the DHT:', cid.toString())

On the other hand, if Helia is instantiated with the default Datastore, i.e.

const helia = await createHelia({
  blockstore,
  libp2p: libp2pConf
})

...

The following call to the query function correctly yields the provider record for the CID:

for await (const entry of helia.datastore.query({
    prefix: '/dht/provider'
  })) {
    console.log('--- Datastore entry:', entry.key.toString())
  }

2color avatar Aug 05 '25 16:08 2color

I believe it's due to the way that this prefix is handled by the different datastore implementations.

FsDatastore inherits from the base: https://github.com/ipfs/js-stores/blob/main/packages/datastore-core/src/base.ts#L86

which calls ._all which converts the prefix to a glob pattern passed to 'it-glob'

The datastore prefix is defined here (as part of the reprovider class) https://github.com/libp2p/js-libp2p/blob/main/packages/kad-dht/src/reprovider.ts/#L80C33-L80C33

Next steps

  • figure out what is the expected prefix string type and whether it's ok for store implementations to deviate from this convention.
  • Potentially rectify the inconsistency in the https://github.com/ipfs/js-stores/ repo

2color avatar Aug 05 '25 16:08 2color

Interesting. I would definitely switch to datastore-level if that is an option.

achingbrain avatar Aug 05 '25 17:08 achingbrain

@achingbrain thanks. Any thoughts on how we should address the prefix discrepancy between implementations?

2color avatar Aug 06 '25 08:08 2color

A test should be added to the datastore interface test suite that exposes the bug along with a patch to fix it.

achingbrain avatar Aug 06 '25 09:08 achingbrain

@akhileshthite Please try with the datastore-level and blockstore-fs (for consistency)

2color avatar Aug 06 '25 11:08 2color

I can confirm that the following works:

import { unixfs } from '@helia/unixfs'
import { inspectorMetrics } from '@ipshipyard/libp2p-inspector-metrics'
import { loadOrCreateSelfKey } from '@libp2p/config'
import { kadDHT } from '@libp2p/kad-dht'
import { enable } from '@libp2p/logger'
import { FsBlockstore } from 'blockstore-fs'
import { LevelDatastore } from 'datastore-level'
import { createHelia, libp2pDefaults } from 'helia'

const datastore = new LevelDatastore('./datastore')
const blockstore = new FsBlockstore('./blockstore')

const libp2pConf = libp2pDefaults()
libp2pConf.metrics = inspectorMetrics()
libp2pConf.services.dht = kadDHT({
  reprovide: {
    interval: 1000 * 60 * 1 // reprovide every 1 minutes
  }
})
libp2pConf.privateKey = await loadOrCreateSelfKey(datastore)

enable('libp2p:kad-dht:reprovider*,libp2p:kad-dht:reprovider*:trace,libp2p:kad-dht*,-libp2p:kad-dht:query*,-libp2p:kad-dht:trace,-libp2p:kad-dht:routing-table*,-libp2p:kad-dht:topology-listener*,-libp2p:kad-dht:network*,-libp2p:kad-dht:peer-routing*') // less noisy

const helia = await createHelia({
  blockstore,
  datastore,
  libp2p: libp2pConf
})
console.log('Created Helia node with PeerID:', helia.libp2p.peerId.toString())

// create a filesystem on top of Helia, in this case it's UnixFS
const fs = unixfs(helia)
const encoder = new TextEncoder()

const cid = await fs.addFile({
  content: encoder.encode('Hello World 🗺️🌎🌍🌏 402!'),
  path: './hello-world.txt'
})

setInterval(async () => {
  console.log('listing provider records in the datastore...')

  for await (const entry of helia.datastore.query({
    prefix: '/dht/provider'
  })) {
    console.log('--- Datastore entry:', entry.key.toString(), entry.value.toString())
  }
}, 1000 * 1)

// Provide the block to the DHT so that other nodes can find and retrieve it
await helia.routing.provide(cid)
console.log('CID provided to the DHT:', cid.toString())

2color avatar Aug 06 '25 16:08 2color

This is looking good 👀

image

CC @tripledoublev

akhileshthite avatar Aug 25 '25 12:08 akhileshthite

Interesting. I would definitely switch to datastore-level if that is an option.

Could you elaborate on why this is recommended? 👀

m0ar avatar Nov 03 '25 08:11 m0ar