Potential InfluxDB v2.7.12 Deadlock
I am using a v2.7.12 InfluxDB on win10 server and experiencing a deadlock very similar to this bug, which was fixed in v2.7.12: https://github.com/influxdata/influxdb/issues/26164
I installed Influx from downloading the binaries here: https://www.influxdata.com/downloads/
Note: I do not know how to replicate this issue. I suspect it's occurring due to competing read/write calls and locks.
I have a v2.7.12 InfluxDB server which freezes and becomes non-responsive at random times with zero explanation as to why. I will try to do my best to explain the issue and what steps I’ve taken to debug. Please let me know if anymore information is needed or would be helpful, I am happy to provide.
The issues began after months of this server running. Data was being written to the server for a few months, but it wasn’t until about 6 months in we started noticing issues. The server is running on win10 and using NSSM as a wrapper for the windows service. There are about 200 series. We just recently deleted the entire database to start fresh and the problem is still happening. We also have countless other installs setup identical (except for what data is written) to this one and none have experienced this issue in the past 3-4 years. Also, there are no performance issues on this machine, InfluxDB has plenty of RAM and CPU, even when deadlocking/after deadlocking.
The issues begin with C# API errors saying “A task was cancelled”. A bit after that we will begin receiving HTTP connection errors. At this point “influxd.exe” is still running, but not responding to anything, including API calls or accessing the UI via :8086. This seems to occur randomly, different times of the day, and sometimes it runs for 2-3 weeks, other times only a few hours or days before deadlocking.
My first debugging step was to turn on Influx’s debug logging. I let that run until the server stopped responding and the logs stopped on a “Bucket Find” call. The Bucket Find call seemed like it was repeating 10-15 times and then the logs would just stop until we restarted the .exe. This was consistent on just about every crash that I debugged. Here is an example:
ts=2025-09-08T18:28:42.677880Z lvl=debug msg="bucket find" log_id=0yreEBzW000 store=new took=0.000ms ts=2025-09-08T18:28:42.677880Z lvl=debug msg="bucket find" log_id=0yreEBzW000 store=new took=0.000ms ts=2025-09-08T18:28:42.677880Z lvl=debug msg="bucket find" log_id=0yreEBzW000 store=new took=0.000ms
I then started to look at the event viewer to ensure the computer wasn’t updating, restarting, or limiting Influx’s ability to execute. Nothing was found.
After that I began to inspect the data files using the commands in influxd.exe. Nothing out of the ordinary was found there, and keep in mind this is a freshly made database its happening on. There was new data being written however.
I then started to look at the hard drive being used to ensure there are no issues there. All of the checks I have run have resulted in no issues being found. We even took influx and everything else off of the drive, reformatted, and re-installed. still occurring.
Lastly I have taken a .dmp of influxd while the “deadlock” is occurring. I am unable to compile influxd as of right now so I do not have the correct symbols however I will post what I did find in hopes someone here understands it. Here is the stack:
STACK_TEXT: 000000000733f298 00007ffcaa47da3e : 0000000000482140 0000000006ff06a0 0000000000000102 0000000000480601 : ntdll!NtWaitForSingleObject+0x14 000000000733f2a0 00000000004821c9 : 0000000006ff06a0 000000000028b000 0000000000000000 0000000000000134 : KERNELBASE!WaitForSingleObjectEx+0x8e 000000000733f340 000000000048054d : 0000000006fee360 00007ffcaa494d2c 000000000733f728 0000000000000778 : influxd+0x821c9 000000000733f4b0 0000000000480601 : 000000000733f4f8 0000000000000000 000000000733f520 000000000733f530 : influxd+0x8054d 000000000733f4e0 000000000043c91b : 0000000000482140 0000000006ff06a0 0000000000000000 000000000733f548 : influxd+0x80601 000000000733f508 000000000043ca56 : 0000000000000002 000000000733f5b0 000000000043bb45 0000000002907c68 : influxd+0x3c91b 000000000733f540 000000000043bb45 : 0000000002907c68 0000000000000134 00000000ffffffff 0000000000000000 : influxd+0x3ca56 000000000733f558 0000000000410e65 : 0000000004b80370 0000000000432df3 0000000000000004 0000000000000005 : influxd+0x3bb45 000000000733f5c0 000000000043446e : 0000000000000003 0000000000000000 000000c000068480 0000000000000002 : influxd+0x10e65 000000000733f628 000000000043436b : 0000000000019391 0000000000000001 000000c000068480 0015294000000000 : influxd+0x3446e 000000000733f670 0000000000433fff : 0000000006ff0600 000000000733f710 0000000000410f96 0000000006ff0600 : influxd+0x3436b 000000000733f6d8 0000000000410f96 : 0000000006ff0600 000000000733f710 0000000006fee360 000000000703d108 : influxd+0x33fff 000000000733f6f0 000000000043d677 : 0000000002907e08 0000000000000778 000000000733f770 0000000000000001 : influxd+0x10f96 000000000733f720 000000000044fd55 : 000000000733fca0 000000000733fcb8 0000000000424f96 000000c004c0c800 : influxd+0x3d677 000000000733fc80 0000000000424f96 : 000000c004c0c800 0000000000000060 000000060042a96a 0000000000000000 : influxd+0x4fd55 000000000733fc98 000000000042ab45 : 000000c004c0c800 000000000733fd00 000000000042331d 000000c00007b250 : influxd+0x24f96 000000000733fcc8 000000000042331d : 000000c00007b250 000000c00007b250 0000000000000000 000000c002b7f340 : influxd+0x2ab45 000000000733fce0 0000000000421eaf : 000000c00007b250 0000000000010000 000000000733fd78 000000000044a7ab : influxd+0x2331d 000000000733fd10 0000000000421d7b : 000000c002b7f340 0000000000010000 000000c004b5a7c8 000000000047e9a9 : influxd+0x21eaf 000000000733fd70 000000000047e9a9 : 000000000733fda8 0000000000485885 0000000006fee360 0000000000000000 : influxd+0x21d7b 000000000733fd90 0000000000421af3 : 000000c004b5a810 0000000000000008 0000000000010000 0000000000000020 : influxd+0x7e9a9 000000c004b5a7d8 0000000000412534 : 000000c002b7f340 0000000000472845 000000c002b7f340 000000c004b5a8f0 : influxd+0x21af3 000000c004b5a838 000000000047223e : 0000000000000000 0000000000000000 0000000000000000 000000004edddba8 : influxd+0x12534 000000c004b5a860 0000000000412625 : 0000000000000068 00000000045d21e0 0000000052610101 000000c004b5a958 : influxd+0x7223e 000000c004b5a900 0000000001143268 : 0000000000000030 0000000000000000 0000000000000000 0000000001145220 : influxd+0x12625 000000c004b5a928 00000000011a598e : 0000000000000000 0000000000000000 0000000000000008 000000003ffffed8 : influxd+0xd43268 000000c004b5a968 00000000011a403b : 000000c004b5a9f8 0000000052610130 0000000000000059 000000003ffffed0 : influxd+0xda598e 000000c004b5a9c0 00000000011a463b : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : influxd+0xda403b 000000c004b5aa68 000000000119693e : 0000000000479989 000000c00335c060 000000c001fa4000 000000c004b5abe0 : influxd+0xda463b 000000c004b5abd8 000000000118bba8 : 000000c004b5ac80 000000c00335c060 000000c004b5ac40 0000000000412625 : influxd+0xd9693e 000000c004b5ac08 00000000011aa8d3 : 000000c001c2e6d8 000000c00335c060 0000000000000027 0000000000000030 : influxd+0xd8bba8 000000c004b5ac90 000000000119081e : 0000000000000000 000000c00335c060 0000000000000027 0000000000000030 : influxd+0xdaa8d3 000000c004b5acc8 000000000112d013 : 000000c001a1a0f0 000000c00335c060 0000000000000027 0000000000000030 : influxd+0xd9081e 000000c004b5ad68 00000000011306d6 : 000000c00335c060 0000000000000027 0000000000000030 000000c004982620 : influxd+0xd2d013 000000c004b5ae00 000000000112ffea : 000000c00335c060 0000000000000027 0000000000000030 000000c0049881e8 : influxd+0xd306d6 000000c004b5aed0 000000000112f7d2 : 000000c00335c060 0000000000000027 0000000000000030 000000c004982620 : influxd+0xd2ffea 000000c004b5b000 000000000112e4b6 : 000000c00335c060 0000000000000027 0000000000000030 000000c004982620 : influxd+0xd2f7d2 000000c004b5b090 000000000112e32a : 000000c00335c060 0000000000000027 0000000000000030 000000c004982620 : influxd+0xd2e4b6 000000c004b5b130 00000000011378d9 : 000000c00335c060 0000000000000027 0000000000000030 000000c004982620 : influxd+0xd2e32a 000000c004b5b1d0 000000000113769d : 000000c0049e4000 000000c00335c060 000000c004990168 0000000000000000 : influxd+0xd378d9 000000c004b5b2d8 00000000027dbb5e : 000000c0049e4000 000000000703fa60 000000c004987f80 0000000000000016 : influxd+0xd3769d 000000c004b5b328 0000000002760390 : 000000c00461e140 0000000000000000 0000000000000001 000000c004b5b518 : influxd+0x23dbb5e 000000c004b5b3d8 0000000002784c8f : 000000c0049e6000 0000000000000000 0000000000000000 0000000000a70a00 : influxd+0x2360390 000000c004b5b528 00000000027848a5 : 000000c00460e300 000000c004530c40 0000000004f55b18 000000c0049e6000 : influxd+0x2384c8f 000000c004b5bab8 00000000023bf77a : 000000c00460e300 000000c004530c40 0000000004f42738 000000c004531f80 : influxd+0x23848a5 000000c004b5bb40 00000000023c13a6 : 000000c004113b80 0000000004f42738 000000c004552fc0 0000000004f2ca50 : influxd+0x1fbf77a 000000c004b5bb90 00000000023bf343 : 000000c004113b80 0000000004f42738 000000c004552fc0 c143e7f78bd9971f : influxd+0x1fc13a6 000000c004b5bd40 00000000023c91e8 : 000000c004113b80 0000000004f42738 000000c004552fc0 000000c004b5bfb0 : influxd+0x1fbf343 000000c004b5bde8 00000000012f706d : 0000000004f42770 0000000004f42738 000000c004552fc0 000000000703cca0 : influxd+0x1fc91e8 000000c004b5be10 00000000012f6ce8 : 0000000004f44378 000000c004113b80 0000000000000000 0000000000480981 : influxd+0xef706d 000000c004b5bfc0 0000000000480981 : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : influxd+0xef6ce8 000000c004b5bfe0 0000000000000000 : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : influxd+0x80981
Some more:
0 Id: 41c8.1160 Suspend: 0 Teb: 000000000028b000 Unfrozen
Child-SP RetAddr Call Site
00 000000000733f298 00007ffcaa47da3e ntdll!NtWaitForSingleObject+0x14
01 000000000733f2a0 00000000004821c9 KERNELBASE!WaitForSingleObjectEx+0x8e
02 000000000733f340 000000000048054d influxd+0x821c9
03 000000000733f4b0 0000000000480601 influxd+0x8054d
04 000000000733f4e0 000000000043c91b influxd+0x80601
05 000000000733f508 000000000043ca56 influxd+0x3c91b
06 000000000733f540 000000000043bb45 influxd+0x3ca56
07 000000000733f558 0000000000410e65 influxd+0x3bb45
08 000000000733f5c0 000000000043446e influxd+0x10e65
09 000000000733f628 000000000043436b influxd+0x3446e
0a 000000000733f670 0000000000433fff influxd+0x3436b
0b 000000000733f6d8 0000000000410f96 influxd+0x33fff
0c 000000000733f6f0 000000000043d677 influxd+0x10f96
0d 000000000733f720 000000000044fd55 influxd+0x3d677
0e 000000000733fc80 0000000000424f96 influxd+0x4fd55
0f 000000000733fc98 000000000042ab45 influxd+0x24f96
10 000000000733fcc8 000000000042331d influxd+0x2ab45
11 000000000733fce0 0000000000421eaf influxd+0x2331d
12 000000000733fd10 0000000000421d7b influxd+0x21eaf
13 000000000733fd70 000000000047e9a9 influxd+0x21d7b
14 000000000733fd90 0000000000421af3 influxd+0x7e9a9
15 000000c004b5a7d8 0000000000412534 influxd+0x21af3
16 000000c004b5a838 000000000047223e influxd+0x12534
17 000000c004b5a860 0000000000412625 influxd+0x7223e
18 000000c004b5a900 0000000001143268 influxd+0x12625
19 000000c004b5a928 00000000011a598e influxd+0xd43268
1a 000000c004b5a968 00000000011a403b influxd+0xda598e
1b 000000c004b5a9c0 00000000011a463b influxd+0xda403b
1c 000000c004b5aa68 000000000119693e influxd+0xda463b
1d 000000c004b5abd8 000000000118bba8 influxd+0xd9693e
1e 000000c004b5ac08 00000000011aa8d3 influxd+0xd8bba8
1f 000000c004b5ac90 000000000119081e influxd+0xdaa8d3
20 000000c004b5acc8 000000000112d013 influxd+0xd9081e
21 000000c004b5ad68 00000000011306d6 influxd+0xd2d013
22 000000c004b5ae00 000000000112ffea influxd+0xd306d6
23 000000c004b5aed0 000000000112f7d2 influxd+0xd2ffea
24 000000c004b5b000 000000000112e4b6 influxd+0xd2f7d2
25 000000c004b5b090 000000000112e32a influxd+0xd2e4b6
26 000000c004b5b130 00000000011378d9 influxd+0xd2e32a
27 000000c004b5b1d0 000000000113769d influxd+0xd378d9
28 000000c004b5b2d8 00000000027dbb5e influxd+0xd3769d
29 000000c004b5b328 0000000002760390 influxd+0x23dbb5e
2a 000000c004b5b3d8 0000000002784c8f influxd+0x2360390
2b 000000c004b5b528 00000000027848a5 influxd+0x2384c8f
2c 000000c004b5bab8 00000000023bf77a influxd+0x23848a5
2d 000000c004b5bb40 00000000023c13a6 influxd+0x1fbf77a
2e 000000c004b5bb90 00000000023bf343 influxd+0x1fc13a6
2f 000000c004b5bd40 00000000023c91e8 influxd+0x1fbf343
30 000000c004b5bde8 00000000012f706d influxd+0x1fc91e8
31 000000c004b5be10 00000000012f6ce8 influxd+0xef706d
32 000000c004b5bfc0 0000000000480981 influxd+0xef6ce8
33 000000c004b5bfe0 0000000000000000 influxd+0x80981
1 Id: 41c8.b98 Suspend: 0 Teb: 00000000`00293000 Unfrozen
Child-SP RetAddr Call Site
00 000000004d18faa8 00007ffcaa47da3e ntdll!NtWaitForSingleObject+0x14
01 000000004d18fab0 00000000004821c9 KERNELBASE!WaitForSingleObjectEx+0x8e
02 000000004d18fb50 000000000048054d influxd+0x821c9
03 000000004d18fcc0 0000000000480601 influxd+0x8054d
04 000000004d18fcf0 000000000043c91b influxd+0x80601
05 000000004d18fd10 000000000043ca56 influxd+0x3c91b
06 000000004d18fd48 000000000044f97a influxd+0x3ca56
07 000000004d18fd60 000000000044f4d7 influxd+0x4f97a
08 000000004d18fdc0 000000000044611d influxd+0x4f4d7
09 000000004d18fe40 000000000044606a influxd+0x4611d
0a 000000004d18fe68 000000000047e8a5 influxd+0x4606a
0b 000000004d18fe90 0000000002907c11 influxd+0x7e8a5
0c 000000004d18fe98 0000000000000000 influxd!preUpdateHookTrampoline+0xd4571`
Update: I just tried running this installation of InfluxDB on a different hard drive to eliminate disk issues and the same deadlock occurred.
@domticchione
We just recently deleted the entire database to start fresh and the problem is still happening.
I'm curious if you could potentially provide me with a reproducer, I was able to reproduce a similar issue in v2.7.11 using the following micro-program:
Would you be able to run this against influx running on windows using v2.7.12 and reproduce? If not, would you be able to send me a script or small program that you were able to reproduce the issue with? Something portable so I can debug as I run it locally.
Thanks!
Reproducing has been very difficult for me due to the scale of the databases its happening in. All of our servers its occurring in have a few years worth of data (between 1-5) with thousands of series (~5000-20,000) and I just don't have the tools/knowledge to replicate in a way that's shareable to you. (basically only happening in production for me)
I will certainly look into your micro-program and get back with an answer in the coming days. I appreciate the support on this issue!
ps if you have tips for how to run the micro-program that would be appreciated. I will be the first to admit I am not proficient in Go. But with AI now a days I'm sure I'll sort it out eventually.
Reproducing has been very difficult for me due to the scale of the databases its happening in. All of our servers its occurring in have a few years worth of data (between 1-5) with thousands of series (~5000-20,000) and I just don't have the tools/knowledge to replicate in a way that's shareable to you. (basically only happening in production for me)
I will certainly look into your micro-program and get back with an answer in the coming days. I appreciate the support on this issue!
ps if you have tips for how to run the micro-program that would be appreciated. I will be the first to admit I am not proficient in Go. But with AI now a days I'm sure I'll sort it out eventually.
There is a README.md that should give general information on how to run the program including the CLI commands to build the dummy_reader and dummy_writer binaries. To install go you can use the following page: https://go.dev/dl/ which will include an MSI file for installing on windows. You can just use powershell(?) to build and run the shell scripts. You may need to adjust them to work with PS as I've only used it on a linux and mac OS device.
@devanbenz Thank you for the detailed write-up. I've got both exe's running now. The storage_bucket_series_num hit 20,000 in ~30-60 sec and the storage_shard_series was even quicker to hit 20,000. No deadlock yet. Going to let this run a while and see what happens over the course of the day.
@devanbenz I started testing with version 2.7.11 to verify I could replicate the same issue you saw and I was able to. I was no longer able to run either read or write exe, due to a bad connection to the bucket, until after I rebooted the server. I then switched over to 2.7.12 and did the same procedure and was unable to get the deadlock to occur. I let both the reader and writer run over night and it still did not replicate.
I also want to mention that the issue I was seeing originally did not only limit access to 1 bucket, it took down the entire web based UI (chronograf?) as well and limited access to ALL buckets/API's. I can keep tinkering, but if you have other ideas feel free to share!
@domticchione if possible could you please provide me with a selection of queries that are being ran on influxdb prior to you seeing the crash? I'm currently investigating the code paths right after the logs you provided above
ts=2025-09-08T18:28:42.677880Z lvl=debug msg="bucket find" log_id=0yreEBzW000 store=new took=0.000ms ts=2025-09-08T18:28:42.677880Z lvl=debug msg="bucket find" log_id=0yreEBzW000 store=new took=0.000ms ts=2025-09-08T18:28:42.677880Z lvl=debug msg="bucket find" log_id=0yreEBzW000 store=new took=0.000ms
@devanbenz I noticed that log as well and found it strange. That log was almost always the last log before I encountered a deadlock. We send a lot of queries (probably more than whats recommended) so it is hard to narrow it down but the main queries being sent at the times the deadlock are:
Used for checking connection
import "experimental/http"
import "csv"
response = http.get(
url: "http://myserver:8086/health")
httpStatus = response.statusCode
responseBody = string(v: response.body)
responseHeaders = response.headers
date = responseHeaders.Date
contentLength = responseHeaders["Content-Length"]
contentType = responseHeaders["Content-Type"]
csvData = "#datatype,string,long,string
#group,false,false,false
#default,,,
,result,table,column
,,0,*
"
csv.from(csv: csvData)
|> map(fn: (r) => ({
httpStatus: httpStatus,
responseBody: responseBody,
date: date,
contentLength: contentLength,
contentType: contentType,
}))
Used for checking if buckets exist
buckets()
Used to check for time sync issues
import "array"
import "system"
array.from(rows: [{ _time: system.time() }])
Used to get latest values
from(bucket: "MyBucket") |> range(start: 0) |> filter(fn: (r) => r._measurement == "myMeas" and (r._field == "myField" or r._field == "myField2" or r._field == "myField3")) |> last() |> yield(name: "myDataSetName")
It is probably worth mentioning that we also have a writing tool (using InfluxDB's C# API) that is writing to potentially every series 24/7 round the clock. We do apply compression most of the time, which prevents a lot of writes from hitting the DB however sometimes the settings are out dated or need tweaked. Also these read calls could be executing all day as well depending on if users leave a specific page open or not (usually by mistake or forgetfulness). I believe the default interval would be 2-5 second queries (i think its 5 but can be lowered if desired). Almost every deadlock occurred with minimal use being applied to the database. We think its possible that someone just left a page open and the queries kept pouring in, not overloading the resources but confusing Influx enough to the point of a deadlock. Maybe some type of asynchronous error.
@devanbenz I noticed that log as well and found it strange. That log was almost always the last log before I encountered a deadlock. We send a lot of queries (probably more than whats recommended) so it is hard to narrow it down but the main queries being sent at the times the deadlock are:
Used for checking connection
import "experimental/http"import "csv"response = http.get(url: "http://myserver:8086/health")httpStatus = response.statusCoderesponseBody = string(v: response.body)responseHeaders = response.headersdate = responseHeaders.DatecontentLength = responseHeaders["Content-Length"]contentType = responseHeaders["Content-Type"]csvData = "#datatype,string,long,string#group,false,false,false#default,,,,result,table,column,,0,*"csv.from(csv: csvData)|> map(fn: (r) => ({httpStatus: httpStatus,responseBody: responseBody,date: date,contentLength: contentLength,contentType: contentType,}))Used for checking if buckets exist
buckets()Used to check for time sync issues
import "array"import "system"array.from(rows: [{ _time: system.time() }])Used to get latest values
from(bucket: "MyBucket") |> range(start: 0) |> filter(fn: (r) => r._measurement == "myMeas" and (r._field == "myField" or r._field == "myField2" or r._field == "myField3")) |> last() |> yield(name: "myDataSetName")It is probably worth mentioning that we also have a writing tool (using InfluxDB's C# API) that is writing to potentially every series 24/7 round the clock. We do apply compression most of the time, which prevents a lot of writes from hitting the DB however sometimes the settings are out dated or need tweaked. Also these read calls could be executing all day as well depending on if users leave a specific page open or not (usually by mistake or forgetfulness). I believe the default interval would be 2-5 second queries (i think its 5 but can be lowered if desired). Almost every deadlock occurred with minimal use being applied to the database. We think its possible that someone just left a page open and the queries kept pouring in, not overloading the resources but confusing Influx enough to the point of a deadlock. Maybe some type of asynchronous error.
Are most/all the queries you're sending through flux and the C# tooling? I'm going to continue looking in to this early next week. I'm curious if you were able to build influxdb with debug symbols at all? It would make reading the stack trace much easier.
Yes all are from the c#/vb InfluxDB API. And I was not able to get it to compile unfortunately.