blockservice: AddBlock will check file exist twice for new file
- The BlockService's AddBlock method look like:
# go-blockservice/blockservice.go
func (s *blockService) AddBlock(o blocks.Block) error {
if s.checkFirst {
if has, err := s.blockstore.Has(c); has || err != nil {
return err
}
}
if err := s.blockstore.Put(o); err != nil {
return err
}
}
- blockstore.Put method look like:
# go-ipfs-blockstore/blockstore.go
func (bs *blockstore) Put(block blocks.Block) error {
k := dshelp.CidToDsKey(block.Cid())
// Has is cheaper than Put, so see if we already have it
exists, err := bs.datastore.Has(k)
if err == nil && exists {
return nil // already stored.
}
return bs.datastore.Put(k, block.RawData())
}
- As we can see it check key exist twice, the check method is:
# go-ds-flatfs/flatfs.go
func (fs *Datastore) Has(key datastore.Key) (exists bool, err error) {
_, path := fs.encode(key)
switch _, err := os.Stat(path); {
case err == nil:
return true, nil
case os.IsNotExist(err):
return false, nil
default:
return false, err
}
}
It's expensive to check file exist with system call (os.Stat), maybe we can cache the check result.
is it right? maybe I made a mistake.
@whyrusleeping look forward to your reply.
@songjiayang that is correct. We try to skirt that issue by using a 'has-cached' blockstore. So the actual blockstore implementation given to the blockservice should be: https://github.com/ipfs/go-ipfs-blockstore/blob/master/arc_cache.go
Where all the has calls get cached in the ARC cache.
We should make this more clear, as I think that many users will just construct a simple blockstore, and thus incur the double has checks.
Also, I would like to get rid of the blockservice entirely, it feels more and more like a not-super-useful wrapper, and its job could be taken over entirely by something else.
cc @Stebalien
@whyrusleeping thanks very much.
how about changing the Put(block blocks.Block) method's response, A example like Put(block blocks.Block) isNew bool, err error. If changed, we can do some optimization work with the isNew result.
So, I generally like to avoid combining read and write operations like that. With that API, we'd have to atomically put the block and check if it already existed. That means, for example, we'd have to serialize concurrent put operations so that only one returns isNew.
Really, I agree with @whyrusleeping. We need to refactor the interfaces. Currently, we have two services responsible for putting the block but we really should have just one.