kubo icon indicating copy to clipboard operation
kubo copied to clipboard

Use multihashes, not CIDs, when checking if a CID is in a wantlist.

Open Stebalien opened this issue 8 years ago • 17 comments

Version information:

go-ipfs version: 0.4.11-dev-48476b292 Repo version: 5 System version: amd64/linux Golang version: go1.8.3

Type:

Enhancement

Severity:

Implementation: Low Figuring this out: High

Description:

Currently, we bitswap using CIDs. This means that, for example, if a peer asks for a CIDv1 thing that we store as a CIDv0 thing, we won't be able to give it to them. This is going to be a problem if we decide to switch to base32 as the default multihash encoding.

~See https://github.com/ipfs/notes/issues/259 for more design related discussion (this issue is to track the actual bug in go-ipfs).~

Stebalien avatar Aug 31 '17 20:08 Stebalien

We also need to do this for our:

  • caching blockstores.
  • Our datastores.
  • etc...

Stebalien avatar Aug 31 '17 20:08 Stebalien

I'm not sure how this is going to establish backward-compatibility -- older nodes will assume the queries are still CIDs?

ghost avatar Aug 31 '17 21:08 ghost

This is just internal; go-ipfs is internally comparing CIDs when it should be comparing multihashes. That is, when we receive a wantlist, we shouldn't take the CID into account when checking if we have the data.

Stebalien avatar Aug 31 '17 21:08 Stebalien

The only external part is "providing" blocks. The backwards-compatable fix would be for DHT nodes to ignore CID codecs when checking if they have any matching provider records but that's not the best solution.

Stebalien avatar Aug 31 '17 21:08 Stebalien

Yeah I get that part, and it helps with the problem that we'll be receiving base32 CIDv1s of stuff that's actually stored with a CIDv0. But older nodes (that compare based on CID, not multihash) won't be able to deal with that, right? So they won't be able to reply to queries.

Check out what we changed in Bitswap a year ago when introducing CID there.

ghost avatar Aug 31 '17 21:08 ghost

Here's the Bitswap PR from back then -- the notes about it got lost somewhere, @whyrusleeping or @diasdavid will hopefully know where they are.

ghost avatar Aug 31 '17 21:08 ghost

https://github.com/ipfs/go-ipfs/pull/3297

ghost avatar Aug 31 '17 21:08 ghost

Note. CIDs on the network is totally fine and kind of useful (it means one always knows how to interpret a block as IPLD); we'll even need them for DAGSwap.

The problem is that we shouldn't, e.g., refuse to respond to a CIDv1 request because we initially computed a CIDv0 for the block when storing it.

Stebalien avatar Aug 31 '17 22:08 Stebalien

@Stebalien what @lgierth is saying is that if we convert a CIDv0 to CIDv1 and the block is stored as a CIDv0 on an older node then that node will not return with the block even though it has it.

kevina avatar Aug 31 '17 22:08 kevina

Ah. And that's where the backwards compat issue comes in. I wonder if we should fix this issue, have a short upgrade period, and then switch to base32 by default.

Note: Fixing this issue won't introduce any problems, it's just that fixing this issue won't let us switch to base32 CIDv1's by default without any back-compat issues...

We could always bump the bitswap version and always send CIDv0 if possible (if the format is DagProtobuf to old peers (making the assumption that they probably don't have many CIDv1 nodes).

Stebalien avatar Aug 31 '17 22:08 Stebalien

Yes using multihashes is definitely helpful -- agree that we could bump the bitswap version and use that as a demarkation line.

Could you track down any writing on this from last year? I recall there was a pretty good document about the Bitswap aspect of the CID introduction.

ghost avatar Aug 31 '17 22:08 ghost

I have no idea where that might be; I'm just basing this off of what @diasdavid has been saying.

Stebalien avatar Aug 31 '17 23:08 Stebalien

It even had a neat table explaining the semantic differences of v0<->v0, v0<->v1, v1<->v1, and v1<->v0 communications. It's a bit funny because this issue here hints at basically reverting the CID change from back then, that's why I'm wondering where these docs have gone.

ghost avatar Aug 31 '17 23:08 ghost

@Stebalien Doesn't Bitswap use multihashes now, or should this issue still be open?

Winterhuman avatar Apr 10 '22 22:04 Winterhuman

Ya, because the blockstore is now keyed by multihash checking is effectively done by multihash even when the API requests by CID.

aschmahmann avatar Jul 29 '22 15:07 aschmahmann

Unfortunately, this isn't entirely fixed. If I add a new block, we check existing wantlists by CID, not multihash. Worse, fixing this is a bit tricky unless/until we change the Multihash representation to be string instead of []byte (for performance reasons).

Stebalien avatar Aug 01 '22 22:08 Stebalien

@Stebalien we can keep using whatever representation and just use unsafe to cast strings to bytes.

More realistically it's an edge case and we are rewriting the server anyway so might as well fix it.

Jorropo avatar Aug 01 '22 22:08 Jorropo

The blockstore now uses multihashes, but bitswap still has some issues.

This now works:

  1. Peer B adds block C with the raw multicodec.
  2. Peer A asks peer B for block C, but with the CBOR multicodec.
  3. Peer B checks its datastore, finds the block (ignoring the codec), returns it to peer A.

This doesn't:

  1. Peer A asks peer B for block C, but the CBOR multicodec.
  2. Peer B adds block C with the raw multicodec.
  3. Peer B checks to see if anyone is asking for block C but doesn't see peer A because it compares CIDs, not multihashes.

Stebalien avatar Oct 11 '22 07:10 Stebalien

an other which (I think, I havn't tested it) doesn't work is -1 relay of messages with different codecs:

  1. Peer A ask B block X with the raw multicodec.
  2. B also want X but as CBOR, so B ask C about X with CBOR codec.
  3. B receive X from C.
  4. B is supposed to send (or send a HAVE) to A about X, but because the codec don't match it doesn't.

Jorropo avatar Oct 11 '22 07:10 Jorropo