lnd icon indicating copy to clipboard operation
lnd copied to clipboard

panic: runtime error: invalid memory address or nil pointer dereference

Open varg-finance opened this issue 1 year ago • 42 comments

Background

LND-Process is shutting down

Your environment

0.15.0-beta, now 0.15.1-beta.rc2 (pre-compiled binary from the release) ubuntu amd64 (fully updated) bitcoind v23.0.0 dedicated Intel Core-i7, 32Gb RAM, 1Tb SSD node's age: 2+ months RTL 0.12.3-beta BOS 12.16.3

Steps to reproduce

During operation, LND went offline and did not restart again.

Expected behaviour

Start LND

Actual behaviour

LND did not start

● lnd.service - LND Lightning Network Daemon Loaded: loaded (/etc/systemd/system/lnd.service; enabled; vendor preset: enabled) Active: activating (auto-restart) (Result: exit-code) since Mon 2022-08-29 22:53:47 MSK; 29s ago Process: 20086 ExecStart=/usr/local/bin/lnd (code=exited, status=2) Main PID: 20086 (code=exited, status=2) CPU: 4.207s

Update LND (0.15.0-beta --> 0.15.1-beta.rc2) and upgrade entire Ubuntu to the latest versions has no effect. The problem appeared suddenly, no changes were made (settings, restarts or something else) - process just crashed without any apparent reason and did not restart again.

Log file in attachment. log.txt

I will be grateful for any advice.

UPD: interesting in log file, "[INF] LNWL: Removed invalid transaction" repeats every restart with the same transaction...

varg-finance avatar Aug 29 '22 20:08 varg-finance

@yyforyongyu

Crypt-iQ avatar Aug 29 '22 20:08 Crypt-iQ

Could you turn on debug or trace log and share the logs again?

yyforyongyu avatar Aug 29 '22 21:08 yyforyongyu

Related to https://github.com/lightningnetwork/lnd/pull/6624?

Roasbeef avatar Aug 29 '22 21:08 Roasbeef

@yyforyongyu , of course. Debug log in attachment. log_debug.txt

varg-finance avatar Aug 30 '22 06:08 varg-finance

are you running any node management software like RTL/lndg/etc?

Crypt-iQ avatar Aug 30 '22 12:08 Crypt-iQ

yes, I use BOS (12.16.3) and RTL (0.12.3-beta).

UPD: just stopped RTL sudo systemctl stop rtl, nothing changed

varg-finance avatar Aug 30 '22 12:08 varg-finance

Have you recently changed your BOS version?

Crypt-iQ avatar Aug 30 '22 15:08 Crypt-iQ

No. My configuration hasn't changed in two+ weeks.

varg-finance avatar Aug 30 '22 16:08 varg-finance

  • When did lnd go offline, did it panic or did you simply just shut it down normally?
  • Have you previously used a different BOS version?
  • What was your configuration like when lnd went down the first time?
  • Are you running / have you ever run other node software besides rtl/bos?
  • what commands are you running with rtl/bos?

Crypt-iQ avatar Aug 30 '22 16:08 Crypt-iQ

  1. panic. like after every restart now.
  2. yes. last update of BOS I made around... 3-4 weeks ago.
  3. after first time I updated LND (0.15.0-beta --> 0.15.1-beta.rc2) and all Ubuntu packages via apt. BOS, RTL, bitcoind stay untouched for now.
  4. no.
  5. mostly basic functionality: RTL) open/close channels, adjust fees, acquire statistics (reports), circular payments (rebalance), generating onchain-addresses, reconnect peers BOS) telegram integration, opening balanced channels, rebalance, reconnect peers, tags management (for rebalance nodes avoiding)... ...perhaps that's all.

varg-finance avatar Aug 30 '22 16:08 varg-finance

  1. panic. like after every restart now.

The very first time it went down, it was a panic or did you stop it and then couldn't start again?

  1. yes. last update of BOS I made around... 3-4 weeks ago.

do you know the version you were on before the update?

Crypt-iQ avatar Aug 30 '22 16:08 Crypt-iQ

The very first time it went down, it was a panic or did you stop it and then couldn't start again? Panic. I did not do anything absolutely at this moment and many hours before. The node worked perfectly fine, and then I receive "connection lost" in telegram.

do you know the version you were on before the update? It's hard to say exactly, but not less 12.13.x. I guess... 12.13.4.

varg-finance avatar Aug 30 '22 16:08 varg-finance

Gotcha, and you were on BOS 12.16.3 at the time of the initial crash?

Crypt-iQ avatar Aug 30 '22 16:08 Crypt-iQ

Yes, exactly.

varg-finance avatar Aug 30 '22 16:08 varg-finance

For RTL, how are you reconnecting to your peers? Is there a button or some API call that you are using?

Crypt-iQ avatar Aug 30 '22 16:08 Crypt-iQ

I choose peer (with disabled channel usually), click on "disconnect" (message "peer disconnected successfully"), thereafter press "connect to peer" and enter publicKey or publicKey[@]address:port . If peer connected ok, the channel often becomes active.

If peer is not connected already (not in the list), i try to connect it.

I think, this GUI uses lncli disconnect/connect.

varg-finance avatar Aug 30 '22 16:08 varg-finance

  • How quickly after "disconnect" does it take you to press "connect to peer" and enter in the info?
  • Did you tend to route a lot of HTLC's?

Crypt-iQ avatar Aug 30 '22 16:08 Crypt-iQ

  1. Well... I don't wait anything... Seconds, obviously. How much time between two mouse clicks : )
  2. My maximum is 80-90 HTLC's, but it happened very rarely (not every day at least), average value is 30-40. Sometimes 10, sometimes 50, average - 30-40. High values usually don't last long, minutes, thereafter it goes down pretty fast.

varg-finance avatar Aug 30 '22 16:08 varg-finance

  1. Well... I don't wait anything... Seconds, obviously. How much time between two mouse clicks : )

Yeah just double-checking, the timing matters here

  1. My maximum is 80-90 HTLC's, but it happened very rarely (not every day at least), average value is 30-40. Sometimes 10, sometimes 50, average - 30-40. High values usually don't last long, minutes, thereafter it goes down pretty fast.

Is that 30-40 avg / day?

Crypt-iQ avatar Aug 30 '22 16:08 Crypt-iQ

30-40 avg simultaneously. Active HTLCs at one moment. I open RTL ("active HTLCs") and see count there, which is usually 30-40. Sometimes 80-90, but it doesn't last long and goes down pretty fast. How many per day... I don't know.

UPD: this counts is usually higher when I try to rebalance something with bos rebalance. If I do nothing at all it usually 15-30.

varg-finance avatar Aug 30 '22 16:08 varg-finance

Сan I try something maybe? Save backups and change something? Second day of node offline is so painful... Expired HTLCs, automatic force-closes of many good channels...

However, if it is not safe, there is no choice. Anyway, I am very grateful for the help and assistance in this situation...

varg-finance avatar Aug 30 '22 17:08 varg-finance

@yyforyongyu do you mind rebasing your PR on top of 0.15.1?

That way @varg-finance can apply https://github.com/lightningnetwork/lnd/pull/6624 and use other channels. The patch isn't tested on a live node afaik. You may be able to close the channel point in question without error, not sure.

Crypt-iQ avatar Aug 30 '22 17:08 Crypt-iQ

Actually, if you end up using that patch when it's ready, I don't think you should try to close the buggy channel point - it might lead to a similar situation. If you have issues with that patch (lnd not starting up, but the panic is gone), we can give you a temporary one on top of that one.

Crypt-iQ avatar Aug 30 '22 21:08 Crypt-iQ

Ok. I am ready to provide you all the logs or other information and won't do anything without your approval.

varg-finance avatar Aug 30 '22 21:08 varg-finance

Here's the branch rebased: https://github.com/lightningnetwork/lnd/compare/v0.15.1-branch...Roasbeef:lnd:remote-log-rebase?expand=1

Can clone from: https://github.com/Roasbeef/lnd/tree/remote-log-rebase

Roasbeef avatar Aug 31 '22 00:08 Roasbeef

Unfortunately, it seems like I need some help with installation. I always used ready binary file in releases before, sorry for wasting your time : (

  1. I try to clone https://github.com/Roasbeef/lnd/tree/remote-log-rebase via HTTPS (git clone https://github.com/Roasbeef/lnd.git). Installation failed:

1

  1. I try to download zip-file, unzip it and install. Also unsuccessful:

2

I understand that I make the simplest mistakes, but I need assistance...

I have go version go1.18 linux/amd64.

varg-finance avatar Aug 31 '22 08:08 varg-finance

@varg-finance after the git clone, you need to check out to the branch with git checkout remote-log-rebase, then make install should work.

yyforyongyu avatar Aug 31 '22 09:08 yyforyongyu

@yyforyongyu , something goes wrong again...

3

...nothing changed... behavior of LND remains the same.

Fresh debug log: new 1.txt

varg-finance avatar Aug 31 '22 09:08 varg-finance

make install will put the binary in your $GOPATH/bin directory. So look for an lnd binary there, and put your $GOPATH/bin to your $PATH.

guggero avatar Aug 31 '22 09:08 guggero

Finally, lnd version 0.15.1-beta commit=v0.3-alpha-11443-g5cbdcde5e !

LND started, works as normal for 1-2 minutes, thereafter shutdown (NOT panic) from this line:

2022-08-31 13:11:21.152 [ERR] CNCT: ChannelArbitrator(67d1c841ad1ff92666ec3829f8c18d57b3fda89b8ba37bc71c2eca2731ce8675:0): unable to force close: No HTLC with ID 670 in channel 750573:1517:0 2022-08-31 13:11:21.152 [ERR] CNCT: ChannelArbitrator(67d1c841ad1ff92666ec3829f8c18d57b3fda89b8ba37bc71c2eca2731ce8675:0): unable to advance state: No HTLC with ID 670 in channel 750573:1517:0 2022-08-31 13:11:21.152 [INF] CNCT: ChainArbitrator shutting down

full log: log 3.txt

UPD: all restarts end in the same way.

varg-finance avatar Aug 31 '22 10:08 varg-finance