influxdb icon indicating copy to clipboard operation
influxdb copied to clipboard

feat: TagValueIterator holds RLock for too long (#26369)

Open devanbenz opened this issue 7 months ago • 0 comments

I looks like we are holding a reader lock and defer'ing it, while at the same time acquiring that same reader lock within the call to return tk.TagValueIterator()

According to Go's RWMutex documentation this is not a good idea: https://pkg.go.dev/sync#RWMutex

If any goroutine calls RWMutex.Lock while the lock is already held by one or more readers, concurrent calls to RWMutex.RLock will block until the writer has acquired (and released) the lock, to ensure that the lock eventually becomes available to the writer. Note that this prohibits recursive read-locking.

I believe this is what is causing the deadlock we are seeing in a few different tickets. I had ran a reproducer where I was effectively reading and writing from a bucket that I had set the retention policy to 1 minute and the shard rollover to 30 seconds. I was able to reproduce the deadlock consistently this way. Adding some logging around the mutexes I am seeing the following during every reproduction:

TagValueIterator is RUnlock()
TagValueIterator is about to RLock()
TagValueIterator is RLock()
TagValueIterator tk about to RLock()
TagValueIterator tk is RLock()
TagValueIterator tk about to RUnlock()
TagValueIterator tk is RUnlock()
TagValueIterator is RUnlock()
TagValueIterator is about to RLock()
TagValueIterator is RLock()
TagValueIterator tk about to RLock()
TagValueIterator tk is RLock()
TagValueIterator tk about to RUnlock()
TagValueIterator tk is RUnlock()
TagValueIterator is RUnlock()
TagValueIterator is about to RLock()
TagValueIterator is RLock()
Lock() is about to be called in AddSeriesList
TagValueIterator tk about to RLock()

The TagValueIterator tk is found within return tk.TagValueIterator()

https://github.com/influxdata/influxdb/blob/eebe655e4c596b8f78bae30e5729092b6732d1a7/tsdb/index/tsi1/log_file.go#L1388

The AddSeriesList Lock() is found here:

https://github.com/influxdata/influxdb/blob/eebe655e4c596b8f78bae30e5729092b6732d1a7/tsdb/index/tsi1/log_file.go#L545

So it appears that we are attempt to lock inside both AddSeriesList and TagValueIterator with different Lock types on the same RWMutex.

After modifying the code to the following I'm no longer seeing the problem. I have a feeling that the recursive call to RLock() is the underlying issue.

(cherry picked from commit 2e842ff21d159cce6ed66266b35706846afe0fe8)

Closes #

Describe your proposed changes here.

  • [ ] I've read the contributing section of the project README.
  • [ ] Signed CLA (if not already signed).

devanbenz avatar May 08 '25 20:05 devanbenz