influxdb Possible Deadlock in TagValueIterator() and AddSeriesList()

Issue Summary I suspect a potential deadlock related to the TagValueIterator() function when interacting with AddSeriesList().

Possible Deadlock Scenario The issue appears to arise due to conflicting RLock() and Lock() calls on f.mu within LogFile. Specifically:

TagValueIterator() acquires an RLock() on f.mu.
It then calls tk.TagValueIterator(), which attempts to acquire another RLock() on tk.f.mu (which is the same as f.mu).
Meanwhile, AddSeriesList() is called and attempts to acquire a write Lock() on f.mu, while RLock() is still held.
This can lead to a deadlock since Go’s sync.RWMutex does not allow acquiring a Lock() when an RLock() is already held.

Relevant Code TagValueIterator() (log_file.go)

func (f *LogFile) TagValueIterator(name, key []byte) TagValueIterator {
    f.mu.RLock() // First RLock
    defer f.mu.RUnlock()
    
    mm, ok := f.mms[string(name)]
    if !ok {
        return nil
    }

    tk, ok := mm.tagSet[string(key)]
    if !ok {
        return nil
    }
    return tk.TagValueIterator() // Calls tk.TagValueIterator(), which also acquires RLock
}

tk.TagValueIterator() (log_file.go)

func (tk *logTagKey) TagValueIterator() TagValueIterator {
    tk.f.mu.RLock() // Second RLock (on the same f.mu)
    a := make([]logTagValue, 0, len(tk.tagValues))
    for _, v := range tk.tagValues {
        a = append(a, v)
    }
    tk.f.mu.RUnlock()

    return newLogTagValueIterator(a)
}

AddSeriesList() (log_file.go)

func (f *LogFile) AddSeriesList(...) {
    //..

    f.mu.Lock() // Write lock on f.mu
    defer f.mu.Unlock()

    //,,.
}

pprof Output When Deadlock Occurred

goroutine 106814401 [semacquire, 6 minutes]:
sync.runtime_SemacquireMutex(0xc00015020c?, 0x78?, 0x3?)
        /usr/local/go/src/runtime/sema.go:77 +0x25
sync.(*RWMutex).Lock(0xc023a48620?)
        /usr/local/go/src/sync/rwmutex.go:152 +0x71
github.com/influxdata/influxdb/v2/tsdb/index/tsi1.(*LogFile).AddSeriesList(0xc0095a71d0, 0xc000150200, {0xc00863f800?, 0x13, 0x0?}, {0xc00863fb00?, 0x13, 0xc00e37daf8?})
       influxdb-2.6.0/tsdb/index/tsi1/log_file.go:545 +0x4a5
github.com/influxdata/influxdb/v2/tsdb/index/tsi1.(*Partition).createSeriesListIfNotExists(0xc037ff10e0, {0xc00863f800, 0x13, 0x20}, {0xc00863fb00, 0x13, 0x20})
       influxdb-2.6.0/tsdb/index/tsi1/partition.go:725 +0x165

goroutine 106814631 [semacquire, 6 minutes]:
sync.runtime_SemacquireMutex(0x3318308?, 0x38?, 0xc?)
        /usr/local/go/src/runtime/sema.go:77 +0x25
sync.(*RWMutex).RLock(...)
        /usr/local/go/src/sync/rwmutex.go:71
github.com/influxdata/influxdb/v2/tsdb/index/tsi1.(*logTagKey).TagValueIterator(0xc02a1a6fb8)
       influxdb-2.6.0/tsdb/index/tsi1/log_file.go:1385 +0x51
github.com/influxdata/influxdb/v2/tsdb/index/tsi1.(*LogFile).TagValueIterator(0xc0095a71d0?, {0xc04537e640?, 0xa?, 0x158ed72?}, {0xc03be04a20, 0x9, 0x28?})
       influxdb-2.6.0/tsdb/index/tsi1/log_file.go:432 +0x185

Currently, the only way to recover from this issue is to restart InfluxDB, which is problematic.

Mar 19 '25 01:03 line301

I am unsure if there is a deadlock by upgrading the lock type of f.mu here. I don't see AddSeriesList called within the same scope as TagValueIterator. Shown by the following example https://goplay.tools/snippet/67evkRgdE9R the only way a deadlock occurs is when a.mu.RLock() is held and then a call in to a.two() is made. If the functions are called independently of each others scopes it does not deadlock.

Are you seeing:

fatal error: all goroutines are asleep - deadlock!

in your stack traces?

Mar 20 '25 15:03 devanbenz

I think the example at https://goplay.tools/snippet/DNGy9ENYYHj is more relevant. In this example, neither "Second RLock" nor "First Lock" is printed.

When this issue occurs, it seems that schema.measurementTagValues is called when a new series is added. In my running program, there was additional code that periodically checks tag values.

Could you check this condition?

To make it easier to reproduce the issue, I modified the Go code by adding a delay between the first and second RLock calls in TagValueIterator. Then, if one client inserts a new series while another client checks tag values, the issue occurs consistently.

Mar 20 '25 23:03 line301

I think the example at goplay.tools/snippet/DNGy9ENYYHj is more relevant. In this example, neither "Second RLock" nor "First Lock" is printed.

When this issue occurs, it seems that schema.measurementTagValues is called when a new series is added. In my running program, there was additional code that periodically checks tag values.

Could you check this condition?

To make it easier to reproduce the issue, I modified the Go code by adding a delay between the first and second RLock calls in TagValueIterator. Then, if one client inserts a new series while another client checks tag values, the issue occurs consistently.

Ah okay. In the example you sent this is not a deadlock. This is more of a lock ordering issue. I would need to trace through the code more and have a reproducer to actually find the issue. A few questions.

Are you actually seeing the bug in a production environment?
If you are seeing the bug can you send me the output you're expecting vs the output you're seeing?
If you are seeing the bug can you send me the exact steps you use to reproduce?

Mar 21 '25 13:03 devanbenz

Yes, I agree — it seems more like a lock ordering issue rather than a classic deadlock.

We are indeed seeing this issue in a production environment.
The pprof output was captured during a reproduction of the same issue in our production environment.
Reproduction steps:

Create one bucket/measurement.
One client writes data every 1 minute (e.g., tag1, tag2, value).
tag1 and tag2 occasionally receive new values, and the total cardinality appears to reach up to ~200,000.
A second client requests the tag list every 5 minutes (e.g., tag1 list).
The issue seems to occur when there's a conflict between write operations and calls to schema.measurementTagValues.

As a temporary workaround, I'm modifying the client program to avoid using schema.measurementTagValues. If TagValueIterator is only called via schema.measurementTagValues, I suspect the issue will no longer occur. However, if there are other internal callers of TagValueIterator, this could still be a potential problem.

Mar 21 '25 14:03 line301

Coming back to this. I'm still investigating whether or not I can write a test to reproduce this issue. One thing I will note is that I don't believe the code path you highlighted is problematic due to

// TagValueIterator returns a value iterator for a tag key.
func (f *LogFile) TagValueIterator(name, key []byte) TagValueIterator {
	f.mu.RLock()
	defer f.mu.RUnlock()

	mm, ok := f.mms[string(name)]
	if !ok {
		return nil
	}

	tk, ok := mm.tagSet[string(key)]
	if !ok {
		return nil
	}
	return tk.TagValueIterator()
}

Calls return which makes the defer f.mu.RUnlock() release the lock from within the LogFile.TagValueIterator scope. I updated the go playground you sent so that we're returning the method where the mutexes' RLock/RUnlock get acquired and released for the second time. When running it with the return this works as expected. https://goplay.tools/snippet/YR1sVxAT_Mp

I'm still going to attempt to repro the issue you're seeing but this code path appears to be a red herring at the moment.

Mar 25 '25 20:03 devanbenz

In the original code, defer f.mu.RUnlock() keeps the read lock (RLock) active until the function returns — which includes the call to tk.TagValueIterator().

However, here's what can happen: Before tk.TagValueIterator() tries to acquire its own RLock, another goroutine(AddSeriesList) attempts to acquire a write lock (Lock) on the same RWMutex.

In Go, once a Lock() is waiting, new RLock() calls are blocked, and the Lock() itself will wait for all existing RLock()s to be released.

As a result, the RLock() inside tk.TagValueIterator() gets blocked, waiting for the pending Lock() to complete, while the Lock() is still waiting for the original RUnlock() — which won't happen until tk.TagValueIterator() returns.

This creates a lock contention situation, where all parties are waiting and nothing can proceed.

This approach seems cleaner and potentially more efficient, as it avoids holding the read lock during the tk.TagValueIterator() call.

func (f *LogFile) TagValueIterator(name, key []byte) TagValueIterator {
	f.mu.RLock()
	mm, ok := f.mms[string(name)]
	if !ok {
		f.mu.RUnlock()
		return nil
	}

	tk, ok := mm.tagSet[string(key)]
	if !ok {
		f.mu.RUnlock()
		return nil
	}

	f.mu.RUnlock()
	return tk.TagValueIterator()
}

during my testing, I observed that the RLock() in tk.TagValueIterator() was using the same RWMutex instance as the one previously locked. The addresses were identical. Of course, this wouldn’t be an issue if the RLock() in tk.TagValueIterator() is operating on a separate mutex unrelated to f.mu.

Mar 26 '25 05:03 line301

Unfortunately I cannot reproduce. If you remove the sleeps and use wait groups in the go playground like so https://goplay.tools/snippet/IVLik43yhML you'll notice that the lock contention is not held. I have a feeling that pprof was taking a snapshot during the semaphore acquire and thats a red herring. I have a draft PR out though to attempt to repro the issue. You're welcome to poke around at it and see if you can get a reproducer in our code. https://github.com/influxdata/influxdb/pull/26193

Mar 26 '25 18:03 devanbenz

It looks like there's someone else experiencing the same issue in this post: https://community.influxdata.com/t/suspected-deadlock-issue-need-help/56772. Seems like it's the same problem with TagValueIterator and AddSeriesList, right?

Mar 28 '25 05:03 line301

@line301 would you mind sending me the exact query(s) that your using when you see the problem occur? Or if your using a client SDK can you point me to a gist with the code in it? Also please send me some line protocol for your schema. I'll be sure to ask the other user for the same info. I'll need that to try and reproduce.

Mar 28 '25 12:03 devanbenz

We add a time.Sleep to allow context switching.

func (tk *logTagKey) TagValueIterator() TagValueIterator {
    time.Sleep(1 * time.Second) // <- for context switching
    tk.f.mu.RLock() // Second RLock (on the same f.mu)
    a := make([]logTagValue, 0, len(tk.tagValues))
    for _, v := range tk.tagValues {
        a = append(a, v)
    }
    tk.f.mu.RUnlock()
    return newLogTagValueIterator(a)
}

write

test,sensor_id=0 value=1
test,sensor_id=1 value=1
test,sensor_id=2 value=1
test,sensor_id=3 value=1
test,sensor_id=4 value=1
~

query same time

import "influxdata/influxdb/schema"
schema.mesurementTagValues(
bucket:"my-bucket",
measurement:"test",
tag:"sensor_id"
)

I added a sleep to force context switching, since it's unpredictable when context switching would normally occur. For the write operation, a new sensor_id is added every second. Separately, the query runs every 0.5 seconds. With this setup, the issue can be consistently reproduced.

Mar 31 '25 10:03 line301

@line301 I was able to finally reproduce this issue and am currently implementing a fix.

May 08 '25 16:05 devanbenz

https://github.com/influxdata/influxdb/commit/2e842ff21d159cce6ed66266b35706846afe0fe8 I have merged the PR for the fix here.

May 08 '25 18:05 devanbenz

Closing issue as this will be part of our 2.7.12 release

May 13 '25 13:05 devanbenz

Can you please check this?

https://community.influxdata.com/t/suspected-deadlock-issue-need-help/56772/2

Sep 12 '25 20:09 rdivia

I am unclear what you are asking here. Check what in what way? We believe that this bug was fixed in 2.7.12. There has been a report of a similar problem in 2.7.12 on Windows.

Can you please check this?

https://community.influxdata.com/t/suspected-deadlock-issue-need-help/56772/2

Sep 18 '25 22:09 davidby-influx

I was asking to line301 to check for similarities.

When I posted my comment (on another interface, the one from the influx community) the text from line301 was the only one visible. Somehow, in this interface (github) my answer got intertwined with other messages, sorry about that.

Sep 19 '25 06:09 rdivia

Hi yes, I was the one who found it on windows and am still experiencing major issues with it. It started on one server and now has started impacting multiple. We have since reverted all of our installations back to 2.7.5 or 2.7.6 which seem very stable for us. We have no choice but to stay on those versions until this is resolved. If I can provide any helpful information please let me know.

Oct 14 '25 14:10 domticchione

@domticchione - so you are seeing this on 2.7.12?

Oct 14 '25 16:10 davidby-influx

yes. 2.7.12, windows. here is a link to my original post. #26797

Oct 14 '25 18:10 domticchione