seg: set stale files not visible until recalcVisibleFiles done
FYI: seems i trying something similar: https://github.com/erigontech/erigon/pull/12380
FYI: seems i trying something similar: #12380
@AskAlexSharov Got it. one thing special in this PR is i try to keep closeAndRemoveFiles thread safe without lock, not sure if it's within your next step 's plan, so explain a bit:
I add an indicator stale in visibleSegment struct. After recalcVisibleSegments, unused old visible segments will be marked as stale. closeAndRemoveFiles happens only when refcount==0 && stale && src.canDelete. This can thread safe without lock because new rotx will be unable to read stale visible Segments.
I don’t understand how it can be called from multiple goroutines.
In my head 2parts 1: // recalcVisibleFiles do skip CanDelete==true // means new RoTx can’t see it
2: newValue := refcnt.AtomicDecrement() If newValue == 0 { closeAndDel() }
Atomics guarantee that all goroutines (and there can’t be new readers because point 1).
what I'm thinking here is exactly point1, canDelete==true seems not enough.
assuming two consecutive readers read segX during merge:
- merge step1: integrateMergedDirtyFiles marks segX as canDelete = true
- rotx_1 read, refCnt=1
- rotx_1 closing, refCnt=0 and canDelete = true
- rotx_2 read, refCnt=1, because recalcVisibleFiles has not been called yet, segX is still visible
- rotx_1 closeAndRemove segX
- rotx_2 access nil segX
- merge step2: recalcVisibleFiles, segX is removed from visible list
rotx_1 read, refCnt=1 and rotx_2 read, refCnt=1 - "read atomics" it's kind-of anti-pattern. Need do "compare_and_swap" things on them: we do leftReaders := idx.readAheadRefcnt.Add(-1)
merge itself is also reader Aggregator.mergeLoopStep does aggTx := a.BeginFilesRo()
(i'm not sure if RoSnapshots follow this practice - i need re-check)
You right - we can make sure that we set canDelete.Store(true) after recalcVisibleFiles:
a.integrateMergedDirtyFiles(outs, in) // can don't set `canDelete`
a.recalcVisibleFiles(a.DirtyFilesEndTxNumMinimax())
a.cleanAfterMerge(in) // set `canDelete`
(it happening inside a.BeginFilesRo() section)
thanks, you're correct about merge, i will work on point3 later
Auite a lot of changes landed in: https://github.com/erigontech/erigon/pull/12339
need somehow merge main
delSeg.canDelete.Store(true)
if delSeg.refcount.Load() == 0 {
delSeg.closeAndRemoveFiles()
}
we have above code to determine closing files or not. But in fact refcount cannot guard files If the code is called by multiple goroutines because atomic value can only keep itself atomic. Two ways to promise this never happen: 1. add a mutex 2. refcount never be 0 here(start a rotx).
merge and prune both call this code. merge is also a reader so refcount.Load()!=0, but prune is not a reader and does not use a mutex.
so the first modification in this PR is creating a view before prune, also this means we don't need if delSeg.refcount.Load() == 0.
The second modification is to implement this:
You right - we can make sure that we set
canDelete.Store(true)afterrecalcVisibleFiles:a.integrateMergedDirtyFiles(outs, in) // can don't set `canDelete` a.recalcVisibleFiles(a.DirtyFilesEndTxNumMinimax()) a.cleanAfterMerge(in) // set `canDelete`(it happening inside
a.BeginFilesRo()section)
we still need mark file as canDelete during merge or prune to skip recalc, so i add another flag stale to indicate that files are not visible after recalc and can be remove.
@AskAlexSharov I would appreciate any corrections
delSeg.canDelete.Store(true) // called by Last reader - at it’s Close/Rollback if delSeg.refcount.Load() == 0 { delSeg.closeAndRemoveFiles() }
delSeg.canDelete.Store(true) if delSeg.refcount.Load() == 0 { delSeg.closeAndRemoveFiles() } This code can’t protect from concurrency - you are roght. But we don’t have this code.
we have: refCnt := src.refcount.Add(-1) if refCnt == 0 && src.canDelete.Load() { and this code Can protect from concurrency: all readers will see different refCnt
https://github.com/erigontech/erigon/blob/a0889a7c552ead7513bb30aab16feaa63342f4e8/erigon-lib/state/history.go#L1118
i mean here, during the Prune: https://github.com/erigontech/erigon/blob/baf0a609131bd09f9a5f138be025c5ccd2ca1485/turbo/snapshotsync/snapshots.go#L1281
i didn't know we have it. yes if sn.refcount.Load() == 0 { is a logical race.
- "last reader" must close files.
isLastReader := refcount.Add(-1) == 0. merge - is also a reader. - only Merge does
canDelete.Store(true)- for simplicity.
Seems if out.refcount.Load() == 0 { also exists in /erigon-lib/state/files_item.go. so weird. but it works just because it happening inside BeginFilesRo :-) seem i need re-visit this place.
one check seems halt, try to re-trigger