ristretto icon indicating copy to clipboard operation
ristretto copied to clipboard

Fixes closing of Cache and Policy to be concurrent safe

Open josephschorr opened this issue 3 years ago • 17 comments

Before this change, calling Close on the cache while a Set call is in progress could occasionally result in a panic when the Set method tried to write to the (now closed) setBuf channel. This commit changes Cache and Policy to keep references to the written-to channels in an internals struct, which is nil-ed out on close and checked in each call site, ensuring that nothing is attempting to write to a closed channel.


This change is Reviewable

josephschorr avatar Oct 08 '21 17:10 josephschorr

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Oct 08 '21 17:10 CLAassistant

One open question: Should we still try to close out the setBuf channel post Close, perhaps in a goroutine, for full cleanup?

josephschorr avatar Oct 08 '21 17:10 josephschorr

I can't see the TeamCity build results (no access), but go test ./... passes locally, for reference.

josephschorr avatar Oct 08 '21 17:10 josephschorr

Thanks for the PR @josephschorr. CI is failing with a data race.

==================
WARNING: DATA RACE
Read at 0x00c0001df0d0 by goroutine 90:
  github.com/dgraph-io/ristretto.(*Cache).setInternal()
      /home/agent/work/7f32719a4b7aa00/cache.go:296 +0x5d
  github.com/dgraph-io/ristretto.(*Cache).SetWithTTL()
      /home/agent/work/7f32719a4b7aa00/cache.go:281 +0x18f
  github.com/dgraph-io/ristretto.(*Cache).Set()
      /home/agent/work/7f32719a4b7aa00/cache.go:273 +0x13e
  github.com/dgraph-io/ristretto.TestConcurrentSetAfterClose.func1()
      /home/agent/work/7f32719a4b7aa00/cache_test.go:240 +0x128

Previous write at 0x00c0001df0d0 by goroutine 79:
  github.com/dgraph-io/ristretto.(*Cache).Close()
      /home/agent/work/7f32719a4b7aa00/cache.go:427 +0xcf
  github.com/dgraph-io/ristretto.TestConcurrentSetAfterClose()
      /home/agent/work/7f32719a4b7aa00/cache_test.go:260 +0x2d1
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1050 +0x1eb

Goroutine 90 (running) created at:
  github.com/dgraph-io/ristretto.TestConcurrentSetAfterClose()
      /home/agent/work/7f32719a4b7aa00/cache_test.go:233 +0x20a
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1050 +0x1eb

Goroutine 79 (running) created at:
  testing.(*T).Run()
      /usr/local/go/src/testing/testing.go:1095 +0x537
  testing.runTests.func1()
      /usr/local/go/src/testing/testing.go:1339 +0xa6
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1050 +0x1eb
  testing.runTests()
      /usr/local/go/src/testing/testing.go:1337 +0x594
  testing.(*M).Run()
      /usr/local/go/src/testing/testing.go:1252 +0x2ff
  main.main()
      _testmain.go:200 +0x223
==================
--- FAIL: TestConcurrentSetAfterClose (0.20s)
    testing.go:965: race detected during execution of test
FAIL
FAIL	github.com/dgraph-io/ristretto	64.962s
?   	github.com/dgraph-io/ristretto/contrib/demo	[no test files]
?   	github.com/dgraph-io/ristretto/contrib/memtest	[no test files]
ok  	github.com/dgraph-io/ristretto/sim	0.032s
ok  	github.com/dgraph-io/ristretto/z	95.954s
ok  	github.com/dgraph-io/ristretto/z/simd	0.029s
FAIL

Try running the tests with the -race flag (that's what ./test.sh runs).

go test -race ./...

danielmai avatar Oct 08 '21 17:10 danielmai

D'oh... I forgot the race detector doesn't run automatically. Will address now.

josephschorr avatar Oct 08 '21 17:10 josephschorr

@danielmai So looking more deeply, the race is more or less the "expected behavior" on that variable, which is why I keep a local copy. I can easily fix this by adding a mutex guard around accessing it, but that does mean that there will be some slight slowdown on all the function calls.

Thoughts?

Edit: Decided to add it for safety

josephschorr avatar Oct 08 '21 17:10 josephschorr

I'll defer to @manishrjain to chime in for this PR.

danielmai avatar Oct 08 '21 18:10 danielmai

@manishrjain Thoughts on the use of a lock vs, say, a compare and swap?

In terms of timing, we're trying to decide whether to have SpiceDB make use of my branch or wait for this to merge and release, so guidance in that area would also be appreciated.

josephschorr avatar Oct 13 '21 15:10 josephschorr

Hey @josephschorr , for now, my recommendation would be to use your fork. Meanwhile, I'll try to find some time to review this PR and understand it's performance implications.

manishrjain avatar Oct 13 '21 16:10 manishrjain

Any updates regarding this PR ?

gaby avatar Dec 11 '22 06:12 gaby

Friendly ping @josephschorr @manishrjain

gaby avatar Jan 11 '23 13:01 gaby

I'm no longer managing this repo. I have my own fork of ristretto that I manage and use.

https://github.com/outcaste-io/ristretto

manishrjain avatar Jan 11 '23 18:01 manishrjain

@josephschorr would you please accept the CLA and resolve the conflict. @joshua-goldstein, @pandeyshubham25, @matthewmcneely could one of you review the code and help with resolving the conflict. Also, let's make sure that the version is updated appropriately.

akon-dey avatar Jan 12 '23 17:01 akon-dey

@akon-dey I'd be happy to rebase my changes, but be aware we've moved to the github.com/outcaste-io/ristretto fork, which fixed the issue as well. This is partially why we stopped asking for updates on our fork containing this branch.

I'll see if I can get this rebased this evening

josephschorr avatar Jan 12 '23 18:01 josephschorr

@josephschorr I was wondering what's the reasoning for using the fork, is not like this repo was abandoned?

gaby avatar Jan 14 '23 18:01 gaby

@gaby This PR sat for quite a while, so we used our own fork during that time. Once github.com/outcaste-io/ristretto became available, we tested it for the fix, found it worked, and so moved to it rather than relying on a custom fork that would not get updates otherwise.

josephschorr avatar Jan 14 '23 18:01 josephschorr

@gaby This PR sat for quite a while, so we used our own fork during that time. Once github.com/outcaste-io/ristretto became available, we tested it for the fix, found it worked, and so moved to it rather than relying on a custom fork that would not get updates otherwise.

Makes sense, thanks for the feedback!

gaby avatar Jan 14 '23 19:01 gaby