milvus
milvus copied to clipboard
[Bug]: wal truncator doesn't work if there's no writing after streamingnode restart or wal balance away.
Is there an existing issue for this?
- [x] I have searched the existing issues
Environment
- Milvus version: v2.6.1
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
The wal truncator doesn't persist any sample of checkpoint.
So after restarting streamingnode, the sample for truncator is lost.
Meanwhile, the samplingTruncator initilize the lastSampled instant with time.Now().
So the checkpoint in half-hour after streamingnode restart is not sampled, so if the milvus doesn't write any data after half-hour(the sample interval), the truncator will not truncate the wal.
// newSamplingTruncator creates a new sampling truncator.
func newSamplingTruncator(
checkpoint *WALCheckpoint,
truncator walimpls.WALImpls,
recoveryMetrics *recoveryMetrics,
) *samplingTruncator {
st := &samplingTruncator{
notifier: syncutil.NewAsyncTaskNotifier[struct{}](),
cfg: newTruncatorConfig(),
truncator: truncator,
mu: sync.Mutex{},
checkpointSamples: []*WALCheckpoint{checkpoint},
lastTruncatedCheckpoint: nil,
lastSampled: time.Now(),
metrics: recoveryMetrics,
}
go st.background()
return st
}```
also see #44369
### Expected Behavior
_No response_
### Steps To Reproduce
```markdown
Milvus Log
No response
Anything else?
No response
/assign @chyezh
Implement a truncator with persisted status
will be fixed by #45350