lnd
lnd copied to clipboard
panic: runtime error: invalid memory address or nil pointer dereference
Background
LND-Process is shutting down
Your environment
0.15.0-beta, now 0.15.1-beta.rc2 (pre-compiled binary from the release) ubuntu amd64 (fully updated) bitcoind v23.0.0 dedicated Intel Core-i7, 32Gb RAM, 1Tb SSD node's age: 2+ months RTL 0.12.3-beta BOS 12.16.3
Steps to reproduce
During operation, LND went offline and did not restart again.
Expected behaviour
Start LND
Actual behaviour
LND did not start
● lnd.service - LND Lightning Network Daemon Loaded: loaded (/etc/systemd/system/lnd.service; enabled; vendor preset: enabled) Active: activating (auto-restart) (Result: exit-code) since Mon 2022-08-29 22:53:47 MSK; 29s ago Process: 20086 ExecStart=/usr/local/bin/lnd (code=exited, status=2) Main PID: 20086 (code=exited, status=2) CPU: 4.207s
Update LND (0.15.0-beta --> 0.15.1-beta.rc2) and upgrade entire Ubuntu to the latest versions has no effect. The problem appeared suddenly, no changes were made (settings, restarts or something else) - process just crashed without any apparent reason and did not restart again.
Log file in attachment. log.txt
I will be grateful for any advice.
UPD: interesting in log file, "[INF] LNWL: Removed invalid transaction" repeats every restart with the same transaction...
@yyforyongyu
Could you turn on debug or trace log and share the logs again?
Related to https://github.com/lightningnetwork/lnd/pull/6624?
@yyforyongyu , of course. Debug log in attachment. log_debug.txt
are you running any node management software like RTL/lndg/etc?
yes, I use BOS (12.16.3) and RTL (0.12.3-beta).
UPD: just stopped RTL sudo systemctl stop rtl
, nothing changed
Have you recently changed your BOS version?
No. My configuration hasn't changed in two+ weeks.
- When did lnd go offline, did it panic or did you simply just shut it down normally?
- Have you previously used a different BOS version?
- What was your configuration like when lnd went down the first time?
- Are you running / have you ever run other node software besides rtl/bos?
- what commands are you running with rtl/bos?
- panic. like after every restart now.
- yes. last update of BOS I made around... 3-4 weeks ago.
- after first time I updated LND (0.15.0-beta --> 0.15.1-beta.rc2) and all Ubuntu packages via apt. BOS, RTL, bitcoind stay untouched for now.
- no.
- mostly basic functionality: RTL) open/close channels, adjust fees, acquire statistics (reports), circular payments (rebalance), generating onchain-addresses, reconnect peers BOS) telegram integration, opening balanced channels, rebalance, reconnect peers, tags management (for rebalance nodes avoiding)... ...perhaps that's all.
- panic. like after every restart now.
The very first time it went down, it was a panic or did you stop it and then couldn't start again?
- yes. last update of BOS I made around... 3-4 weeks ago.
do you know the version you were on before the update?
The very first time it went down, it was a panic or did you stop it and then couldn't start again?
Panic. I did not do anything absolutely at this moment and many hours before. The node worked perfectly fine, and then I receive "connection lost" in telegram.
do you know the version you were on before the update?
It's hard to say exactly, but not less 12.13.x. I guess... 12.13.4.
Gotcha, and you were on BOS 12.16.3 at the time of the initial crash?
Yes, exactly.
For RTL, how are you reconnecting to your peers? Is there a button or some API call that you are using?
I choose peer (with disabled channel usually), click on "disconnect" (message "peer disconnected successfully"), thereafter press "connect to peer" and enter publicKey or publicKey[@]address:port . If peer connected ok, the channel often becomes active.
If peer is not connected already (not in the list), i try to connect it.
I think, this GUI uses lncli disconnect/connect
.
- How quickly after "disconnect" does it take you to press "connect to peer" and enter in the info?
- Did you tend to route a lot of HTLC's?
- Well... I don't wait anything... Seconds, obviously. How much time between two mouse clicks : )
- My maximum is 80-90 HTLC's, but it happened very rarely (not every day at least), average value is 30-40. Sometimes 10, sometimes 50, average - 30-40. High values usually don't last long, minutes, thereafter it goes down pretty fast.
- Well... I don't wait anything... Seconds, obviously. How much time between two mouse clicks : )
Yeah just double-checking, the timing matters here
- My maximum is 80-90 HTLC's, but it happened very rarely (not every day at least), average value is 30-40. Sometimes 10, sometimes 50, average - 30-40. High values usually don't last long, minutes, thereafter it goes down pretty fast.
Is that 30-40 avg / day?
30-40 avg simultaneously. Active HTLCs at one moment. I open RTL ("active HTLCs") and see count there, which is usually 30-40. Sometimes 80-90, but it doesn't last long and goes down pretty fast. How many per day... I don't know.
UPD: this counts is usually higher when I try to rebalance something with bos rebalance
. If I do nothing at all it usually 15-30.
Сan I try something maybe? Save backups and change something? Second day of node offline is so painful... Expired HTLCs, automatic force-closes of many good channels...
However, if it is not safe, there is no choice. Anyway, I am very grateful for the help and assistance in this situation...
@yyforyongyu do you mind rebasing your PR on top of 0.15.1?
That way @varg-finance can apply https://github.com/lightningnetwork/lnd/pull/6624 and use other channels. The patch isn't tested on a live node afaik. You may be able to close the channel point in question without error, not sure.
Actually, if you end up using that patch when it's ready, I don't think you should try to close the buggy channel point - it might lead to a similar situation. If you have issues with that patch (lnd not starting up, but the panic is gone), we can give you a temporary one on top of that one.
Ok. I am ready to provide you all the logs or other information and won't do anything without your approval.
Here's the branch rebased: https://github.com/lightningnetwork/lnd/compare/v0.15.1-branch...Roasbeef:lnd:remote-log-rebase?expand=1
Can clone from: https://github.com/Roasbeef/lnd/tree/remote-log-rebase
Unfortunately, it seems like I need some help with installation. I always used ready binary file in releases before, sorry for wasting your time : (
- I try to clone https://github.com/Roasbeef/lnd/tree/remote-log-rebase via HTTPS (
git clone https://github.com/Roasbeef/lnd.git
). Installation failed:
- I try to download zip-file, unzip it and install. Also unsuccessful:
I understand that I make the simplest mistakes, but I need assistance...
I have go version go1.18 linux/amd64.
@varg-finance after the git clone
, you need to check out to the branch with git checkout remote-log-rebase
, then make install
should work.
@yyforyongyu , something goes wrong again...
...nothing changed... behavior of LND remains the same.
Fresh debug log: new 1.txt
make install
will put the binary in your $GOPATH/bin
directory. So look for an lnd
binary there, and put your $GOPATH/bin
to your $PATH
.
Finally, lnd version 0.15.1-beta commit=v0.3-alpha-11443-g5cbdcde5e !
LND started, works as normal for 1-2 minutes, thereafter shutdown (NOT panic) from this line:
2022-08-31 13:11:21.152 [ERR] CNCT: ChannelArbitrator(67d1c841ad1ff92666ec3829f8c18d57b3fda89b8ba37bc71c2eca2731ce8675:0): unable to force close: No HTLC with ID 670 in channel 750573:1517:0 2022-08-31 13:11:21.152 [ERR] CNCT: ChannelArbitrator(67d1c841ad1ff92666ec3829f8c18d57b3fda89b8ba37bc71c2eca2731ce8675:0): unable to advance state: No HTLC with ID 670 in channel 750573:1517:0 2022-08-31 13:11:21.152 [INF] CNCT: ChainArbitrator shutting down
full log: log 3.txt
UPD: all restarts end in the same way.