boxo icon indicating copy to clipboard operation
boxo copied to clipboard

pinning/pinner: Pinner should not be tasked to fetch blocks

Open MichaelMure opened this issue 2 years ago • 5 comments

Having worked on the Pinner in various way (including alternate implementation), it occurred to me that it really shouldn't be tasked to:

  1. write the root block into storage in Pin(): https://github.com/ipfs/go-ipfs-pinner/blob/72f5e02e73db5e05ef0a140a7938cbc89dfc38b0/dspinner/pin.go#L177
  2. fetch the full DAG in Pin(): https://github.com/ipfs/go-ipfs-pinner/blob/72f5e02e73db5e05ef0a140a7938cbc89dfc38b0/dspinner/pin.go#L202

This is for different (not obvious, I admit) reasons:

  • often, the caller side know if the blocks are local or not. In any case, it's trivial to fetch the blocks on the caller side if not.
  • that would allow to merge Pin and PinWithMode and simplify the API
  • this hurts composability in weird scenario like mine, where you want to control where that fetching is happening, and where the blocks are going (aka, not necessarily in the normal DAGService)
  • it might avoid double writes
  • it seems like this pain is already felt within kubo: https://github.com/ipfs/kubo/blob/82467bc936fe24355c930f7226fce7e41fee591c/core/commands/dag/import.go#L73-L87

Admittedly, things have worked that way for a long time, but fixing that might help in the long run. Looking at where Pin is called in kubo, it also looks like a non-invasive change.

MichaelMure avatar Feb 03 '23 13:02 MichaelMure

On an intuition level, this has a similar vibe as having the caller side of bitswap adding blocks in storage (https://github.com/ipfs/go-bitswap/pull/571), which helped splitting client and server side, among other things.

MichaelMure avatar Feb 03 '23 13:02 MichaelMure

cc @Jorropo for visibility, I remember you were suggesting refactoring the way we do fetch during pinning

lidel avatar Feb 06 '23 14:02 lidel

The pinner keeps track of long term state in it's internal pin database, it also keep track of state in the blockstore, it would be VERY weird if it kept track of some state but not all of it (if we still had the internal DB but not the blockstore part of it).

Currently it does not even store the blocks directly, it relies on side effect of the blockservice (blockservice stores all block downloaded).

Also unlike https://github.com/ipfs/go-bitswap/pull/571 where there was a bug where blocks were local puts were duplicated (once in blockservice and once in bitswap), I am not aware of any bug here, it's just very side effect heavy code that hard to use without the monolith architecture Kubo was built on.

Maybe there is something better elsewhere but this seems like a complete refactor of the pinner which I don't want to take on right now. I am not convainced that the two usecases here (pinning per requests and pinning on a monolith) are similar enough to warant sharing a package. Except than a traversal logic (which can be abstracted in a new or already existing third package) I don't see what they would have in commun.

Jorropo avatar Feb 21 '23 20:02 Jorropo

The pinner keeps track of long term state in it's internal pin database, it also keep track of state in the blockstore, it would be VERY weird if it kept track of some state but not all of it (if we still had the internal DB but not the blockstore part of it).

Yet that's what is actually happening. Consider those two examples (among other):

  • block/put: kubo insert the block into the blockService, then call PinWithMode, which only record the pin. No check is made, Pinner blindly accept the pin and guarantee nothing.
  • pin/add: kubo fetch the root block, then call Pin, which will insert that root node, fetch the rest of the DAG then record the pin. Here, the Pinner does guarantee that the data is there.

Also unlike https://github.com/ipfs/go-bitswap/pull/571 where there was a bug where blocks were local puts were duplicated (once in blockservice and once in bitswap), I am not aware of any bug here, it's just very side effect heavy code that hard to use without the monolith architecture Kubo was built on.

If you look at my pin/add example, I'm pretty sure there is a duplicated Put there. DAGService get the block from the network (side effect: put), then pass it to Pinner which will write it again.

Maybe there is something better elsewhere but this seems like a complete refactor of the pinner which I don't want to take on right now.

I'm only suggesting to eliminate Pin in favor of PinWithMode (although Pin is a better name). This function is only used 4 times in kubo (not counting tests), so that should be quite manageable. We get back to a uniform API, eliminates double write, remove the odd side effect that makes it hard to compose and reuse ...

As for bitswap, I'd like to ask you to do the exercise in reverse: why would Pinner need to write blocks? Why does it needs to fetch a DAG? I don't see better reason than "it was written that way in ancient ages because it was convenient".

MichaelMure avatar Feb 22 '23 12:02 MichaelMure

Additionally, Add() is currently racy with the GC, as it doesn't create a PinLock. This is because the pinner doesn't have access to the blockstore, while the calling side (kubo) does.

MichaelMure avatar Mar 03 '23 16:03 MichaelMure