Flake TestRevision
Which github workflows are flaking?
unit tests
Which tests are flaking?
go.etcd.io/etcd/server/v3/etcdserver/api/v3compactor: TestRevision
Github Action link
https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/etcd-io_etcd/17469/pull-etcd-unit-test/1760242318838861824
Reason for failure (if possible)
{Failed === RUN TestRevision
logger.go:130: 2024-02-21T09:59:44.200Z INFO starting auto revision compaction {"revision": 90, "revision-compaction-retention": 10}
revision_test.go:48: len(actions) = 0, expected >= 1
--- FAIL: TestRevision (0.05s)
}
Anything else we need to know?
No response
Hi,
from my understanding this comes from the wait timeout here:
https://github.com/etcd-io/etcd/blob/47e9a16a5c76718240b21e9be959c26556e87955/server/etcdserver/api/v3compactor/revision_test.go#L31
How do you normally handle such a case?
Following https://github.com/etcd-io/etcd/issues/17054#issuecomment-1843017613, adding a time.Sleep(11 * time.Millisecond) on https://github.com/etcd-io/etcd/blob/7ded2ac8de85509f30e4c4125b061ff36b0160bb/server/etcdserver/api/v3compactor/revision.go#L79 will allow you to reproduce the issue
I could be wrong, but it seems every rg.Wait(1) in the test if effectively just a time.Sleep(10 * time.Millisecond) + recorder stream channel flush
Unfortunately, the fix described in https://github.com/etcd-io/etcd/pull/17513 is not enough as "do not timeout" causes the very first rg.Wait(1) to wait forever as the fc.Advance(revInterval) that precedes it advances time before the Revision.Run loop has started (you can check this by sleeping before advancing time)
There is also this rg.Wait(1) which seems to serve no purpose https://github.com/etcd-io/etcd/blob/7ded2ac8de85509f30e4c4125b061ff36b0160bb/server/etcdserver/api/v3compactor/revision_test.go#L56
I believe to fix this, we would need to know when the Revision.Run loop is ready
A hacky way to implement this could be:
func newFakerClock() *fakerClock {
return &fakerClock{
FakeClock: clockwork.NewFakeClock(),
afterRequest: make(chan struct{}),
}
}
func (frc *fakerClock) After(d time.Duration) <-chan time.Time {
select {
case frc.afterRequest <- struct{}{}:
default:
}
return frc.FakeClock.After(d)
}
func TestRevision(t *testing.T) {
fc := newFakerClock() // <- this changed
rg := &fakeRevGetter{testutil.NewRecorderStreamWithWaitTimout(math.MaxInt64), 0} // <- this changed
compactable := &fakeCompactable{testutil.NewRecorderStreamWithWaitTimout(10 * time.Millisecond)}
tb := newRevision(zaptest.NewLogger(t), fc, 10, rg, compactable)
tb.Run()
defer tb.Stop()
<-fc.afterRequest // <- this is new
fc.Advance(revInterval)
rg.Wait(1)
// ...
though I do not know if that is an acceptable solution. Thoughts @ahrtr ?