dd-trace-go icon indicating copy to clipboard operation
dd-trace-go copied to clipboard

contrib/gorm.io/gorm.v1: Flaky TestSQLServer

Open felixge opened this issue 3 years ago • 3 comments

This test appears flaky, see: https://app.circleci.com/pipelines/github/DataDog/dd-trace-go/3878/workflows/86cd2577-3e4d-43b8-b85c-f943f71b40dc/jobs/27790/tests#failed-test-0

This failure is particularly interesting. Unless we're calling Rows.Close() on a nil pointer, this could be a bug in the stdlib or something of similar interest.

    --- FAIL: TestSQLServer/Query (0.02s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x7e1e16]

goroutine 181 [running]:
testing.tRunner.func1.2(0xb35f20, 0x110b7a0)
	/usr/local/go/src/testing/testing.go:1143 +0x332
testing.tRunner.func1(0xc000103980)
	/usr/local/go/src/testing/testing.go:1146 +0x4b6
panic(0xb35f20, 0x110b7a0)
	/usr/local/go/src/runtime/panic.go:965 +0x1b9
database/sql.(*Rows).close(0x0, 0x0, 0x0, 0x0, 0x0)
	/usr/local/go/src/database/sql/sql.go:3165 +0x76
database/sql.(*Rows).Close(0x0, 0xb0cde0, 0xca5ca0)
	/usr/local/go/src/database/sql/sql.go:3161 +0x33
gopkg.in/DataDog/dd-trace-go.v1/contrib/internal/sqltest.testQuery.func1(0xc000103980)
	/home/circleci/dd-trace-go.v1/contrib/internal/sqltest/sqltest.go:151 +0x56f
testing.tRunner(0xc000103980, 0xc0002e9860)
	/usr/local/go/src/testing/testing.go:1193 +0xef
created by testing.(*T).Run
	/usr/local/go/src/testing/testing.go:1238 +0x2b3

cc @DataDog/tracing-go

See #1320 which is similar (seems like we support to different versions/flavors of gorm?)

felixge avatar Jun 02 '22 07:06 felixge

RE: gorm flavors, github.com/jinzhu/gorm was moved to gorm.io/gorm.v1, and the former is no longer updated. See also #1219

nsrip-dd avatar Jun 02 '22 18:06 nsrip-dd

Small update: 5000 local runs with the same DB server as the CI running in a Docker image. Could not reproduce. Perhaps SSH-ing into CI is needed.

gbbr avatar Jun 06 '22 12:06 gbbr

RE: nil pointer dereference, the failure is because of calling Rows.Close on a nil pointer, because the rows are nil due to the error that shows up earlier in the linked failure:

Expected nil, but got: mssql.Error{Number:208, State:0x1, Class:0x10, Message:"Invalid object name 'testgorm'.", ServerName:"19fb0c797101", ProcName:"", LineNo:1, All:[]mssql.Error{mssql.Error{Number:208, State:0x1, Class:0x10, Message:"Invalid object name 'testgorm'.", ServerName:"19fb0c797101", ProcName:"", LineNo:1, All:

The problem is here in the SQL test helper package. A non-nil error from Query should end the test, but it doesn't and the test proceeds to access an invalid object. There are a lot of tests in this repo that do some form of assert.NotNil(err) where they should probably do require.NotNil(err). Maybe a bulk refactor is in order?

Also, a lot of these database-related flakes are of the form "table testgorm does not exist". This looks more like unreliability of the CI environment than a flaw in the tests themselves IMO. Is there much we can do to mitigate that?

nsrip-dd avatar Jun 13 '22 13:06 nsrip-dd