go icon indicating copy to clipboard operation
go copied to clipboard

x/sys/unix: TestEventPortErrors failures

Open gopherbot opened this issue 2 years ago • 9 comments

#!watchflakes
post <- pkg == "golang.org/x/sys/unix" && test == "TestEventPortErrors"

Issue created automatically to collect these failures.

Example (log):

--- FAIL: TestEventPortErrors (0.00s)
    syscall_solaris_test.go:186: unexpected lack of timeout

watchflakes

gopherbot avatar Feb 02 '23 19:02 gopherbot

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "golang.org/x/sys/unix" && test == "TestEventPortErrors"
2023-01-31 16:01 illumos-amd64 sys@e7d7f631 go@4fe46cee x/sys/unix.TestEventPortErrors (log)
--- FAIL: TestEventPortErrors (0.00s)
    syscall_solaris_test.go:186: unexpected lack of timeout

watchflakes

gopherbot avatar Feb 02 '23 19:02 gopherbot

This test is Solaris-specific (attn @golang/solaris @golang/illumos), and was added in CL 324630 and adjusted in CL 422338 (attn @nshalman @tklauser @ianlancetaylor).

This test isn't in line with https://go.dev/wiki/CodeReviewComments#useful-test-failures: it doesn't log what actually happened. In particular it would be useful to know what the actual error returned by port.GetOne was, not just the fact that it wasn't ETIME. So probably a good starting point would be to fix up the test to log more information on failure.

bcmills avatar Feb 02 '23 20:02 bcmills

Change https://go.dev/cl/465015 mentions this issue: unix: improve flaky solaris test logging

gopherbot avatar Feb 02 '23 21:02 gopherbot

See previously #45643.

bcmills avatar Feb 02 '23 21:02 bcmills

It would really help us if we could get the OS-illumos label put on these sorts of issues when they apply to illumos as well as Solaris, FWIW.

jclulow avatar Feb 02 '23 22:02 jclulow

Good point — my apologies for not thinking to add it!

bcmills avatar Feb 02 '23 22:02 bcmills

Per the discussion in https://go.dev/cl/465015 I think the next step is to make the tests handle the syscalls returning EINTR more gracefully. The occurrence reported by the bot isn't the only place where it could happen.

Rambling to myself: The tricky part is that I don't have any clever ideas of how to intentionally exercise those code paths in the tests... Perhaps there's a clever way with dtrace? The important thing would be to document how to trigger the EINTR to make sure that the tests when running in CI will do all the right things, not to actually trigger EINTR in every CI run (I think....)

EDIT to add: Probably don't need dtrace. Better is to define a maximum number of times we'll tolerate being interrupted during such tests, wrap the call in a for loop, loop around on EINTR, break out on the expected value (whether nil or ETIME or whatever) and fail the test if we exit the loop with EINTR again. The happy path will enter the loop once, the flaky path will probably only loop once or twice, and the failure path can be exercised by temporarily adding a line explicitly setting err to EINTR.

That code might get a little verbose, so it might make sense to write one wrapper each for Get and GetOne.

nshalman avatar Feb 02 '23 23:02 nshalman

My current sketch (needs some light commenting, and to be used in all the tests currently calling GetOne...):

diff --git a/unix/syscall_solaris_test.go b/unix/syscall_solaris_test.go
index 6c2b906..d79e9fb 100644
--- a/unix/syscall_solaris_test.go
+++ b/unix/syscall_solaris_test.go
@@ -18,6 +18,24 @@ import (
        "golang.org/x/sys/unix"
 )

+var MAX_EINTR = 10
+
+func getOneRetry(t *testing.T, p *unix.EventPort, timeout *unix.Timespec) (e *unix.PortEvent, err error) {
+       t.Helper()
+       for i := 1; i < MAX_EINTR; i++ {
+               e, err = p.GetOne(timeout)
+               // to exercise the MAX_EINTR uncomment the following line
+               //err = unix.EINTR
+               if err != unix.EINTR {
+                       break
+               }
+       }
+       if err != nil {
+               return nil, err
+       }
+       return e, err
+}
+
 func TestStatvfs(t *testing.T) {
        if err := unix.Statvfs("", nil); err == nil {
                t.Fatal(`Statvfs("") expected failure`)
@@ -181,7 +199,7 @@ func TestEventPortErrors(t *testing.T) {
        }
        timeout := new(unix.Timespec)
        timeout.Nsec = 1
-       _, err = port.GetOne(timeout)
+       _, err = getOneRetry(t, port, timeout)
        if err != unix.ETIME {
                // See https://go.dev/issue/58259
                // Perhaps we sometimes get EINTR ???

nshalman avatar Feb 03 '23 02:02 nshalman

Change https://go.dev/cl/465055 mentions this issue: unix: make solaris syscall tests less flaky

gopherbot avatar Feb 03 '23 02:02 gopherbot

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "golang.org/x/sys/unix" && test == "TestEventPortErrors"
2023-02-24 13:20 illumos-amd64 sys@92c4c39f go@f76fc28c x/sys/unix.TestEventPortErrors (log)
--- FAIL: TestEventPortErrors (0.00s)
    syscall_solaris_test.go:188: port.GetOne(&{0 1}) returned error interrupted system call, want timer expired

watchflakes

gopherbot avatar Mar 08 '23 18:03 gopherbot

Looks like the improved error message works. Good reminder for me to finish up https://go.dev/cl/465055

nshalman avatar Mar 08 '23 18:03 nshalman