gocql icon indicating copy to clipboard operation
gocql copied to clipboard

Data Race During --race Test

Open steve-gray opened this issue 3 years ago • 8 comments
trafficstars

Probably needs proper mutexing - this one periodically causes flaking in CI for me. The items hitting it are: github.com/gocql/gocql.(*tokenAwareHostPolicy).Init() vs github.com/gocql/gocql. (*tokenAwareHostPolicy).updateReplicas()

image

steve-gray avatar Jan 21 '22 04:01 steve-gray

It seems this is something that should be fixed in upstream gocql/gocql as it seems that upstream is affected too. I'll be happy to merge the fix upstream.

martin-sucha avatar Jan 22 '22 10:01 martin-sucha

I've checked the code, but it seems gocql does not spawn goroutines using the policy before calling Init, so there shouldn't be any race. Do you by any chance reuse the policy instance between multiple sessions? Unfortunately I don't see from the screenshot where goroutine 61 was spawned, but it seems that it was spawned somewhere in Session.init, which is called after returning from tokenAwareHostPolicy.Init.

martin-sucha avatar Jan 22 '22 11:01 martin-sucha

The issue is the assignment of the logger on line 401 during Init() can race with a use of the logger in the updateReplicas where it passes to getStrategy. The updateReplicas is getting called concurrently with the Init() because the schema event listener debouncer starts further up in the sequence, and if you do per-test schema operations, its possible to get into a state where one arrives during the race, because it's seeing stuff like the previous tests keyspace getting torn down.

	s.schemaEvents = newEventDebouncer("SchemaEvents", s.handleSchemaEvent, s.logger)

In short, if a schema update arrives during startup, you can get this. I believe without the race detector, it'll probably nil-pointer as the logger wasn't assigned.

steve-gray avatar Jan 22 '22 21:01 steve-gray

The issue is the assignment of the logger on line 401 during Init() can race with a use of the logger in the updateReplicas where it passes to getStrategy.

But what leads to that? It seems that Init is called and returns before any connections are created, so how do you reach the Session.handleEvent that ultimately leads to updateReplicas?

Init is called in NewSession:

https://github.com/scylladb/gocql/blob/6aca262598eadfc2c433ac6f790fd8b6abf86f88/session.go#L161

There are no goroutines that gocql spawns until this point.

First connection is made in Session.init:

https://github.com/scylladb/gocql/blob/6aca262598eadfc2c433ac6f790fd8b6abf86f88/session.go#L181

The only explanation I have right now is that the policy instance is used with multiple sessions. Is that the case? Or why do we reach handleEvent? Could you please provide example code that reproduces the issue?

martin-sucha avatar Jan 25 '22 10:01 martin-sucha

Multiple sessions, but each only created after the previous has been .Closed()'d. However, I can't share the entire body of the application - but it seems like the thrashing during my unit tests (creating and deleting lots of keyspaces, as each test uses its own) makes this exponentially more likely.

steve-gray avatar Jan 25 '22 17:01 steve-gray

Reusing host selection policies between multiple sessions (even after closing the old session) is not supported as the old session might still be using the policy. Adding a lock to Init won't fix the fact that the old session is still using the policy. I suggest you use a new instance of the policy every time you create a session. I guess we need to document that reusing policies is not supported and panic in tokenAwareHostPolicy.Init if the policy was already initialized.

martin-sucha avatar Jan 26 '22 06:01 martin-sucha

Sounds like it’s a more along the lines of a defective design pattern - and the strategies should be passed as a factory so that each session the driver summons can get its own.

However that boat has probably sailed in terms keeping things compatible with todays world. It’d be a fairly breaking change to fix.

Get Outlook for iOShttps://aka.ms/o0ukef


From: Martin Sucha @.> Sent: Wednesday, January 26, 2022 4:30:56 PM To: scylladb/gocql @.> Cc: Steve Gray @.>; Author @.> Subject: Re: [scylladb/gocql] Data Race During --race Test (Issue #94)

Reusing host selection policies between multiple sessions (even after closing the old session) is not supported as the old session might still be using the policy. Adding a lock to Init won't fix the fact that the old session is still using the policy. I suggest you use a new instance of the policy every time you create a session. I guess we need to document that reusing policies is not supported and panic in tokenAwareHostPolicy.Init if the policy was already initialized.

— Reply to this email directly, view it on GitHubhttps://github.com/scylladb/gocql/issues/94#issuecomment-1021905431, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADPDTP46JG5ZHUB75FZGHHDUX6IKBANCNFSM5MOREHVQ. You are receiving this because you authored the thread.Message ID: @.***>

steve-gray avatar Jan 26 '22 08:01 steve-gray

Sounds like it’s a more along the lines of a defective design pattern - and the strategies should be passed as a factory so that each session the driver summons can get its own.

However that boat has probably sailed in terms keeping things compatible with todays world. It’d be a fairly breaking change to fix.

Yes, I agree. Unfortunately the policy interface is like this for a long time and there could be external implementations, so we can't easily change it.

martin-sucha avatar Jan 26 '22 08:01 martin-sucha