Dirty Segment re-open is not internally thread safe
These code sections are not thread safe, becuase testing the refcount does not protect the variables in the critical sections from multiple access. This is becuase nothing stops the refcount from incrementing after it is accesses, but before the code section is completed:
func (s *DirtySegment) Reopen(dir string) (err error) {
if s.refcount.Load() == 0 {
s.closeSeg()
s.Decompressor, err = seg.NewDecompressor(filepath.Join(dir, s.FileName()))
if err != nil {
return fmt.Errorf("%w, fileName: %s", err, s.FileName())
}
}
return nil
}
func (s *DirtySegment) ReopenIdxIfNeed(dir string, optimistic bool) (err error) {
if len(s.Type().IdxFileNames(s.version, s.from, s.to)) == 0 {
return nil
}
if s.refcount.Load() == 0 {
err = s.reopenIdx(dir)
if err != nil {
if !errors.Is(err, os.ErrNotExist) {
if optimistic {
log.Warn("[snapshots] open index", "err", err)
} else {
return err
}
}
}
}
return nil
}
At the moment this is protected by a hot fix which locks the dirty segments during tx creation e.g.
This is not ideal and is likely to be error prone because:
- It will not be obvious to callers of BeginRo that they need to do this
- The lock is overkill as it locks the whole segment when a local lock will suffice and is likely to cause performance issues due to lock contention.
func (s *RoSnapshots) View() *View {
s.visibleSegmentsLock.RLock()
defer s.visibleSegmentsLock.RUnlock()
var sgs btree.Map[snaptype.Enum, *RoTx]
s.segments.Scan(func(segtype snaptype.Enum, value *Segments) bool {
// BeginRo increments refcount - which is contended
s.dirtySegmentsLock.RLock()
defer s.dirtySegmentsLock.RUnlock()
sgs.Set(segtype, value.BeginRo())
return true
})
return &View{s: s, VisibleSegments: sgs, baseSegType: coresnaptype.Transactions} // Transactions is the last segment to be processed, so it's the most reliable.
}
A possible solution to this is to use an RWMutex with try lock instead, the protected code would then look like this:
func (s *DirtySegment) Reopen(dir string) (err error) {
if s.lock.TryLock() {
err = func() error {
defer s.lock.Unlock()
s.closeSeg()
s.Decompressor, err = seg.NewDecompressor(filepath.Join(dir, s.FileName()))
return err
}()
if err != nil {
return fmt.Errorf("%w, fileName: %s", err, s.FileName())
}
}
return nil
}
Step towards: https://github.com/erigontech/erigon/pull/12380 ( i will merge it after alpha5 )
In my head:
.View()must be lock-free: means it will do onlys.visible.Load()(wherevisibleisatomic.Pointerto a slice. this slice is pre-calculated in background byrecalcVisibleFiles()).View()will not touch dirty field - so it will not needdirtyMutex.lock()
Seems we are couple PR's from there.
seems fixed. feel free to re-open if not.