Possible Deadlock in TagValueIterator() and AddSeriesList()
Issue Summary
I suspect a potential deadlock related to the TagValueIterator() function when interacting with AddSeriesList().
Possible Deadlock Scenario The issue appears to arise due to conflicting RLock() and Lock() calls on f.mu within LogFile. Specifically:
TagValueIterator()acquires anRLock()onf.mu.- It then calls
tk.TagValueIterator(), which attempts to acquire anotherRLock()ontk.f.mu(which is the same asf.mu). - Meanwhile,
AddSeriesList()is called and attempts to acquire a writeLock()onf.mu, whileRLock()is still held. - This can lead to a deadlock since Go’s
sync.RWMutexdoes not allow acquiring aLock()when anRLock()is already held.
Relevant Code
TagValueIterator() (log_file.go)
func (f *LogFile) TagValueIterator(name, key []byte) TagValueIterator {
f.mu.RLock() // First RLock
defer f.mu.RUnlock()
mm, ok := f.mms[string(name)]
if !ok {
return nil
}
tk, ok := mm.tagSet[string(key)]
if !ok {
return nil
}
return tk.TagValueIterator() // Calls tk.TagValueIterator(), which also acquires RLock
}
tk.TagValueIterator() (log_file.go)
func (tk *logTagKey) TagValueIterator() TagValueIterator {
tk.f.mu.RLock() // Second RLock (on the same f.mu)
a := make([]logTagValue, 0, len(tk.tagValues))
for _, v := range tk.tagValues {
a = append(a, v)
}
tk.f.mu.RUnlock()
return newLogTagValueIterator(a)
}
AddSeriesList() (log_file.go)
func (f *LogFile) AddSeriesList(...) {
//..
f.mu.Lock() // Write lock on f.mu
defer f.mu.Unlock()
//,,.
}
pprof Output When Deadlock Occurred
goroutine 106814401 [semacquire, 6 minutes]:
sync.runtime_SemacquireMutex(0xc00015020c?, 0x78?, 0x3?)
/usr/local/go/src/runtime/sema.go:77 +0x25
sync.(*RWMutex).Lock(0xc023a48620?)
/usr/local/go/src/sync/rwmutex.go:152 +0x71
github.com/influxdata/influxdb/v2/tsdb/index/tsi1.(*LogFile).AddSeriesList(0xc0095a71d0, 0xc000150200, {0xc00863f800?, 0x13, 0x0?}, {0xc00863fb00?, 0x13, 0xc00e37daf8?})
influxdb-2.6.0/tsdb/index/tsi1/log_file.go:545 +0x4a5
github.com/influxdata/influxdb/v2/tsdb/index/tsi1.(*Partition).createSeriesListIfNotExists(0xc037ff10e0, {0xc00863f800, 0x13, 0x20}, {0xc00863fb00, 0x13, 0x20})
influxdb-2.6.0/tsdb/index/tsi1/partition.go:725 +0x165
goroutine 106814631 [semacquire, 6 minutes]:
sync.runtime_SemacquireMutex(0x3318308?, 0x38?, 0xc?)
/usr/local/go/src/runtime/sema.go:77 +0x25
sync.(*RWMutex).RLock(...)
/usr/local/go/src/sync/rwmutex.go:71
github.com/influxdata/influxdb/v2/tsdb/index/tsi1.(*logTagKey).TagValueIterator(0xc02a1a6fb8)
influxdb-2.6.0/tsdb/index/tsi1/log_file.go:1385 +0x51
github.com/influxdata/influxdb/v2/tsdb/index/tsi1.(*LogFile).TagValueIterator(0xc0095a71d0?, {0xc04537e640?, 0xa?, 0x158ed72?}, {0xc03be04a20, 0x9, 0x28?})
influxdb-2.6.0/tsdb/index/tsi1/log_file.go:432 +0x185
Currently, the only way to recover from this issue is to restart InfluxDB, which is problematic.
I am unsure if there is a deadlock by upgrading the lock type of f.mu here. I don't see AddSeriesList called within the same scope as TagValueIterator. Shown by the following example https://goplay.tools/snippet/67evkRgdE9R the only way a deadlock occurs is when a.mu.RLock() is held and then a call in to a.two() is made. If the functions are called independently of each others scopes it does not deadlock.
Are you seeing:
fatal error: all goroutines are asleep - deadlock!
in your stack traces?
I think the example at https://goplay.tools/snippet/DNGy9ENYYHj is more relevant. In this example, neither "Second RLock" nor "First Lock" is printed.
When this issue occurs, it seems that schema.measurementTagValues is called when a new series is added. In my running program, there was additional code that periodically checks tag values.
Could you check this condition?
To make it easier to reproduce the issue, I modified the Go code by adding a delay between the first and second RLock calls in TagValueIterator. Then, if one client inserts a new series while another client checks tag values, the issue occurs consistently.
I think the example at goplay.tools/snippet/DNGy9ENYYHj is more relevant. In this example, neither "Second RLock" nor "First Lock" is printed.
When this issue occurs, it seems that
schema.measurementTagValuesis called when a new series is added. In my running program, there was additional code that periodically checks tag values.Could you check this condition?
To make it easier to reproduce the issue, I modified the Go code by adding a delay between the first and second
RLockcalls inTagValueIterator. Then, if one client inserts a new series while another client checks tag values, the issue occurs consistently.
Ah okay. In the example you sent this is not a deadlock. This is more of a lock ordering issue. I would need to trace through the code more and have a reproducer to actually find the issue. A few questions.
- Are you actually seeing the bug in a production environment?
- If you are seeing the bug can you send me the output you're expecting vs the output you're seeing?
- If you are seeing the bug can you send me the exact steps you use to reproduce?
Yes, I agree — it seems more like a lock ordering issue rather than a classic deadlock.
- We are indeed seeing this issue in a production environment.
- The pprof output was captured during a reproduction of the same issue in our production environment.
- Reproduction steps:
- Create one bucket/measurement.
- One client writes data every 1 minute (e.g.,
tag1,tag2,value). tag1andtag2occasionally receive new values, and the total cardinality appears to reach up to ~200,000.- A second client requests the tag list every 5 minutes (e.g.,
tag1list). - The issue seems to occur when there's a conflict between write operations and calls to
schema.measurementTagValues.
As a temporary workaround, I'm modifying the client program to avoid using schema.measurementTagValues.
If TagValueIterator is only called via schema.measurementTagValues, I suspect the issue will no longer occur.
However, if there are other internal callers of TagValueIterator, this could still be a potential problem.
Coming back to this. I'm still investigating whether or not I can write a test to reproduce this issue. One thing I will note is that I don't believe the code path you highlighted is problematic due to
// TagValueIterator returns a value iterator for a tag key.
func (f *LogFile) TagValueIterator(name, key []byte) TagValueIterator {
f.mu.RLock()
defer f.mu.RUnlock()
mm, ok := f.mms[string(name)]
if !ok {
return nil
}
tk, ok := mm.tagSet[string(key)]
if !ok {
return nil
}
return tk.TagValueIterator()
}
Calls return which makes the defer f.mu.RUnlock() release the lock from within the LogFile.TagValueIterator scope. I updated the go playground you sent so that we're returning the method where the mutexes' RLock/RUnlock get acquired and released for the second time. When running it with the return this works as expected. https://goplay.tools/snippet/YR1sVxAT_Mp
I'm still going to attempt to repro the issue you're seeing but this code path appears to be a red herring at the moment.
In the original code, defer f.mu.RUnlock() keeps the read lock (RLock) active until the function returns — which includes the call to tk.TagValueIterator().
However, here's what can happen:
Before tk.TagValueIterator() tries to acquire its own RLock,
another goroutine(AddSeriesList) attempts to acquire a write lock (Lock) on the same RWMutex.
In Go, once a Lock() is waiting, new RLock() calls are blocked,
and the Lock() itself will wait for all existing RLock()s to be released.
As a result, the RLock() inside tk.TagValueIterator() gets blocked, waiting for the pending Lock() to complete,
while the Lock() is still waiting for the original RUnlock() — which won't happen until tk.TagValueIterator() returns.
This creates a lock contention situation, where all parties are waiting and nothing can proceed.
This approach seems cleaner and potentially more efficient, as it avoids holding the read lock during the tk.TagValueIterator() call.
func (f *LogFile) TagValueIterator(name, key []byte) TagValueIterator {
f.mu.RLock()
mm, ok := f.mms[string(name)]
if !ok {
f.mu.RUnlock()
return nil
}
tk, ok := mm.tagSet[string(key)]
if !ok {
f.mu.RUnlock()
return nil
}
f.mu.RUnlock()
return tk.TagValueIterator()
}
during my testing, I observed that the RLock() in tk.TagValueIterator() was using the same RWMutex instance as the one previously locked. The addresses were identical. Of course, this wouldn’t be an issue if the RLock() in tk.TagValueIterator() is operating on a separate mutex unrelated to f.mu.
Unfortunately I cannot reproduce. If you remove the sleeps and use wait groups in the go playground like so https://goplay.tools/snippet/IVLik43yhML you'll notice that the lock contention is not held. I have a feeling that pprof was taking a snapshot during the semaphore acquire and thats a red herring. I have a draft PR out though to attempt to repro the issue. You're welcome to poke around at it and see if you can get a reproducer in our code. https://github.com/influxdata/influxdb/pull/26193
It looks like there's someone else experiencing the same issue in this post: https://community.influxdata.com/t/suspected-deadlock-issue-need-help/56772. Seems like it's the same problem with TagValueIterator and AddSeriesList, right?
@line301 would you mind sending me the exact query(s) that your using when you see the problem occur? Or if your using a client SDK can you point me to a gist with the code in it? Also please send me some line protocol for your schema. I'll be sure to ask the other user for the same info. I'll need that to try and reproduce.
We add a time.Sleep to allow context switching.
func (tk *logTagKey) TagValueIterator() TagValueIterator {
time.Sleep(1 * time.Second) // <- for context switching
tk.f.mu.RLock() // Second RLock (on the same f.mu)
a := make([]logTagValue, 0, len(tk.tagValues))
for _, v := range tk.tagValues {
a = append(a, v)
}
tk.f.mu.RUnlock()
return newLogTagValueIterator(a)
}
write
test,sensor_id=0 value=1
test,sensor_id=1 value=1
test,sensor_id=2 value=1
test,sensor_id=3 value=1
test,sensor_id=4 value=1
~
query same time
import "influxdata/influxdb/schema"
schema.mesurementTagValues(
bucket:"my-bucket",
measurement:"test",
tag:"sensor_id"
)
I added a sleep to force context switching, since it's unpredictable when context switching would normally occur. For the write operation, a new sensor_id is added every second. Separately, the query runs every 0.5 seconds. With this setup, the issue can be consistently reproduced.
@line301 I was able to finally reproduce this issue and am currently implementing a fix.
https://github.com/influxdata/influxdb/commit/2e842ff21d159cce6ed66266b35706846afe0fe8 I have merged the PR for the fix here.
Closing issue as this will be part of our 2.7.12 release
Can you please check this?
https://community.influxdata.com/t/suspected-deadlock-issue-need-help/56772/2
I am unclear what you are asking here. Check what in what way? We believe that this bug was fixed in 2.7.12. There has been a report of a similar problem in 2.7.12 on Windows.
Can you please check this?
https://community.influxdata.com/t/suspected-deadlock-issue-need-help/56772/2
I was asking to line301 to check for similarities.
When I posted my comment (on another interface, the one from the influx community) the text from line301 was the only one visible. Somehow, in this interface (github) my answer got intertwined with other messages, sorry about that.
Hi yes, I was the one who found it on windows and am still experiencing major issues with it. It started on one server and now has started impacting multiple. We have since reverted all of our installations back to 2.7.5 or 2.7.6 which seem very stable for us. We have no choice but to stay on those versions until this is resolved. If I can provide any helpful information please let me know.
@domticchione - so you are seeing this on 2.7.12?
yes. 2.7.12, windows. here is a link to my original post. #26797