diskv icon indicating copy to clipboard operation
diskv copied to clipboard

Make the strict memory limit optional

Open eric opened this issue 4 years ago • 6 comments
trafficstars

We would rather not have a panic() if the memory limit is exceeded due to concurrent access.

This makes the panic optional.

eric avatar Jan 20 '21 19:01 eric

Hmm, I don't think this PR does what you think it does — that panic indicates a reasonably serious error, I think. Can you say a bit more what you're trying to get out of this?

peterbourgon avatar Jan 21 '21 13:01 peterbourgon

Haha, you're totally right. I ended up trying to fix the wrong panic. It's actually this panic that we are running into regularly with errors like:

159451 bytes still won't fit in the cache! (max 10485760 bytes)

As I look at this problem again, I'm feeling more confused than I was before. Previously I was thinking we were running into a race condition, but I see that the caller of ensureCacheSpaceWithLock() takes out a write lock, so there shouldn't be any way that this panic happens if everything has been deleted from the cache, right?

eric avatar Jan 27 '21 06:01 eric

Yes, I can't see how that would manifest. Confusing. Would you be willing to try and reproduce with a bit more info added to the panic?

- panic(fmt.Sprintf("%d bytes still won't fit in the cache! (max %d bytes)", valueSize, d.CacheSizeMax))
+ panic(fmt.Sprintf("%d bytes still won't fit in the cache! (current %d objects and %d bytes, max %d bytes)", valueSize, len(d.cache), d.cacheSize, d.CacheSizeMax)) 

peterbourgon avatar Jan 27 '21 13:01 peterbourgon

I've adapted it to have an option to turn all of these panics into errors and am also adding more to the message so we can diagnose what's going on.

eric avatar Jan 27 '21 19:01 eric

I appreciate the pragmatism of the PR, but the panic is a panic and not an error for a reason :) as this condition appears to have identified a serious bug somewhere. ~If you can't try out the patch I suggested above, can you provide a reproducible test case, or maybe tell me how to build one myself?~

edit: Oops, missed your second clause somehow. I'll wait for more details from you 👍

peterbourgon avatar Jan 27 '21 19:01 peterbourgon

We have this deployed on consumer devices and it only happens infrequently. We don't have any steps to reproduce it.

It's much better for us to eventually stop caching values and quietly return an error than it is to take down the process and stop working entirely.

I agree that it appears there's a real bug here, but we need something to make it so that it doesn't crash, or stop using diskv entirely until we understand what the bug is.

eric avatar Jan 27 '21 19:01 eric