pdns
pdns copied to clipboard
pdns recursor crash on startup,The probability of crash is relatively high
-
system version: Debian 9
-
kernal version: Linux 4.19.117.bsk.10-amd64
-
pdns recursor version: 4.7.1
-
gcc/g++ version: 6.3.0
-
crash infos: Thread 8 "pdns-r/distr" received signal SIGSEGV, Segmentation fault.
-
crash code postion:
- FileName: syncrec.cc
- in class: nsspeeds_t
- line code: 227
-
problem causes: The now variable released on the stack is sometimes used in the lambda function, because the SyncRes sr object may have been released, and the member d_now will also be released, so the now variable in the lambda function will be illegal and the program will crash
-
correct code: Modify line 227 as follows:lambda function uses value copy
ind.modify(it, [now](DecayingEwmaCollection& d) { d.d_lastget = now; });
Thanks for the report. It's not clear to me yet what sequence of events could cause the scenario you describe. I'm also wondering why you are seeing this and others do not. Do you have backtrace perhaps? The configuration file would also be nice. Before fixing I would like to fully understand this and perhaps write a regression test for it.
bt:
#8 0x00005555559419ef in SyncRes::shuffleInSpeedOrder (this=this@entry=0x7fffd405cec0, tnameservers=std::unordered_map with 13 elements = {...}, prefix="", auth=...) at syncres.cc:1808
#9 0x000055555591b62b in SyncRes::doResolveAt (this=this@entry=0x7fffd405cec0, nameservers=..., auth=..., flawedNSSet=
code:
describe: You don't need to look at the stack, there will be problems in the analysis of the code logic level。If the sr.beginResolve function call ends, the getRootNS function will also return. At this time, the sr object will be destructed. If the lambda function of the fastest function in nsspeeds_t has not been executed, illegal data access will occur, causing the process to crash.
Is this a backtrace of the crash you are referring to ?
I see shuffleInspeedOrder
bering executed, but getRootNS
and beginResolve
are on the stack. So the SyncRes
object is still alive.
At this moment I still have trouble seeing how fastest
and the lambda could be executed while the corresponding SyncRes
has gone out of scope. beginResolve
is a synchronous function, it returns only after work done (even though the name would suggest async execution).
I have thought a bit about this a bit more but still have trouble seeing the circumstances you describe could happen: SyncRes
being out of scope while fastest
is being executed.
I really would appreciate both a config file and a full backtrace (not leaving out the topmost frames) of an actual crash you observed.
Hello @zjs604381586 , it has been a week since my questions. Do you have answers?
I also looked at the logic, maybe my analysis is wrong, sorry