lnd icon indicating copy to clipboard operation
lnd copied to clipboard

Trying to remove tower from client causes rpc error: code = Unknown desc = tower has unacked updates

Open iangregsondev opened this issue 5 years ago • 15 comments
trafficstars

Background

Trying to remove a tower wtclient but it gives me an issue

Your environment

  • version of lnd : lnd version 0.10.99-beta commit=clock/v1.0.0-106-gc1ef5bb908606343d2636c8cd345169e064bdc91
  • which operating system (uname -a on *Nix) : ubuntu (in container)
  • version of btcd, bitcoind, or other backend bitcoins 0.19.1
  • any other relevant environment details

Steps to reproduce

I have a wtclient setup to the wrong address, in fact in the logs it is telling me it can’t dial. I want to remove it but it doesn’t let me

bash-5.0# lncli --macaroonpath /lnd/chain/bitcoin/testnet/admin.macaroon   --tlscertpath /shared/tls.cert wtclient remove 038f7f36689b9d7274702f5bce3a5d8bc4596d4894d9985c5203604fff4daef425
[lncli] rpc error: code = Unknown desc = tower has unacked updates

this is what the towers command retuns

   "towers": [
        {
            "pubkey": "038f7f36689b9d7274702f5bce3a5d8bc4596d4894d9985c5203604fff4daef425",
            "addresses": [
                "157.245.68.69:9911"
            ],
            "active_session_candidate": true,
            "num_sessions": 1,
            "sessions": [
            ]
        },

Expected behaviour

Allow me to remove a tower or allow some way to FORCE removal

Actual behaviour

It won't let me, please see the issue above.

iangregsondev avatar Jun 26 '20 17:06 iangregsondev

I believe removing a tower when a backup hasn't been fully processed isn't safe at the moment because it won't be replayed to any other existing towers cc @cfromknecht.

I have a wtclient setup to the wrong address, in fact in the logs it is telling me it can’t dial. I want to remove it but it doesn’t let me

If you just want to modify the address, then you can specify it in the lncli wtclient remove command, and add the new one with the lncli wtclient add command.

wpaulino avatar Jun 26 '20 18:06 wpaulino

Hi @wpaulino thanks for the response.

That is true, I did want to modify the address but I wasn't aware I could.

I have just tried the following but it did not work

lncli --macaroonpath /lnd/chain/bitcoin/testnet/admin.macaroon --tlscertpath /shared/tls.cert wtclient remove 038f7f36689b9d7274702f5bce3a5d8bc4596d4894d9985c5203604fff4daef425 021f6fddf84ccaf1a87c99634770e2b7fb25eac890f8f4e5501abbf1a60b25d4fc

so the address starting with 038 is what I wish to remove OR change with the address starting with 02a

I can easily add a new one which I can do, but removing OR changing the one mentioned above doesn't work.

Maybe I have the syntax of "wtclient remove" wrong

Can you confirm ?

Thanks

iangregsondev avatar Jun 27 '20 06:06 iangregsondev

Ah, I thought you were referring to a network address. If you want to modify the public key, then you had the correct command the first time which resulted in the error (lncli wtclient remove PUBLIC_KEY). Are you able to add the new tower without needing to remove the stale one?

wpaulino avatar Jun 29 '20 18:06 wpaulino

@wpaulino sure - that's what I have done, so I have a new watchtower client connecting to the correct URL. All good.

But the old one still persists and although it's not causing any issues, it does output errors of WTC into the logs.

I just thought it would be possible to force remove it.

iangregsondev avatar Jun 30 '20 06:06 iangregsondev

I think this has bene resolved?

Roasbeef avatar Jan 21 '21 00:01 Roasbeef

The root issue hasn't, which is replaying any pending backups to new towers. That would make the removal of a stale tower possible.

wpaulino avatar Jan 21 '21 19:01 wpaulino

Is there any way to force remove a watchtower? I've got pending backups that won't go through, so I get the same error here. In the meantime I've added a new watchtower but I still see RPC log spam from the unremovable old WT.

djkazic avatar Jul 18 '21 14:07 djkazic

FWIW when I added a new watchtower it uploaded backup states without an issue. I suspect the underlying issue is that when the channel.db was corrupted by a power failure, I restored the most up to date version of it that wasn't corrupt. Could this "travel back in time" cause this broken state for watchtowers?

djkazic avatar Jul 20 '21 11:07 djkazic

You should never restore the channel.db file from a backup! You'll not only put your funds at risk but cause all sorts of problems. The watchtower not being able to send backups just being one of them.

guggero avatar Jul 20 '21 12:07 guggero

I didn't want to lose all my channels. I saved 12 of them by doing so, and some others force closed.

djkazic avatar Jul 20 '21 12:07 djkazic

In addition, in this case I confirmed no payments had been routed from the time of the backup to the time of corruption.

Considering that, I don't think there was substantial risk versus guaranteed closing all channels (thus having to pay two on chain tx fees -- one to close and one to reopen after restoring from an SCB).

djkazic avatar Jul 21 '21 15:07 djkazic

I have the same issue. One of the watchtower I use has gone offline and I keep getting errors trying to upload to it. lncli wtclient remove gives this message in lnd.log

2022-01-03 16:11:38.819 [ERR] RPCS: [/wtclientrpc.WatchtowerClient/RemoveTower]: tower has unacked updates

So does it mean that until the tower is online again I will not be able to remove it (and it will keep trying to dial the tower until then)? I noted there was a case of memory leak in case of off-line tower backing queuing up by @C-Otto

BhaagBoseDK avatar Jan 03 '22 17:01 BhaagBoseDK

I also have this issue with unacked updates. Aside from not being able to remove the unsynched tower to silence its log entries I don't seem to be able to add a new tower. It adds the new tower and shows the new one in lncli wtclient towers but it never creates an active session to the new tower. Since I can't take advantage of watch tower protection anymore I am just going to set wtclient.active=0 in my config. Is there any other solution?

kshartman avatar Jan 05 '22 04:01 kshartman

I'm eaven having the same error, the weird thing is that my it's even online (first and last line on this logs shows successfull backups)

2022-06-25 21:08:03.502 [INF] WTCL: (anchor) Queued backup(f57a7852b48a4b9ecffa39602ec77d77b14b78f74053682142e24d37692da1c5, 4381) successfully for session 03f4573b60e91f2fcf7b9132150388714faa9bca7f96a4c1aae0220573cbe1442b
2022-06-25 21:08:07.096 [INF] WTCL: (legacy) Client stats: tasks(received=0 accepted=35722 ineligible=0) sessions(acquired=0 exhausted=78)
2022-06-25 21:08:09.187 [INF] WTCL: (anchor) Client stats: tasks(received=0 accepted=31402 ineligible=0) sessions(acquired=0 exhausted=65)
2022-06-25 21:08:21.971 [ERR] WTCL: (legacy) SessionQueue(02f13ecb0207727944100f0ec09642983e5d40a3058cc615b6a55dab760d3b809d) unable to dial tower at 0301135932e89600b3582513c648d46213dc425c7666e3380faa7dbb51f7e6a3d6@tower4excc3jdaoxcqzbw7gzipoknzqn3dbnw2kfdfhpvvbxagrzmfad.onion:9911: dial proxy failed: socks connect tcp 172.22.0.2:9050->tower4excc3jdaoxcqzbw7gzipoknzqn3dbnw2kfdfhpvvbxagrzmfad.onion:9911: unknown error host unreachable
2022-06-25 21:08:21.971 [ERR] WTCL: (legacy) SessionQueue(03720b216fa0ad2b9cb9a9743d0c5f5c4c503f6853fb8c0a2e406b41ef0f8dd40c) unable to dial tower at 0301135932e89600b3582513c648d46213dc425c7666e3380faa7dbb51f7e6a3d6@tower4excc3jdaoxcqzbw7gzipoknzqn3dbnw2kfdfhpvvbxagrzmfad.onion:9911: dial proxy failed: socks connect tcp 172.22.0.2:9050->tower4excc3jdaoxcqzbw7gzipoknzqn3dbnw2kfdfhpvvbxagrzmfad.onion:9911: unknown error host unreachable
2022-06-25 21:08:21.975 [ERR] WTCL: (anchor) SessionQueue(02ce2976a7a557c799e273c881d258e84e48bc9af40b9c44f94ee53edb08e68fe4) unable to dial tower at 0301135932e89600b3582513c648d46213dc425c7666e3380faa7dbb51f7e6a3d6@tower4excc3jdaoxcqzbw7gzipoknzqn3dbnw2kfdfhpvvbxagrzmfad.onion:9911: dial proxy failed: socks connect tcp 172.22.0.2:9050->tower4excc3jdaoxcqzbw7gzipoknzqn3dbnw2kfdfhpvvbxagrzmfad.onion:9911: unknown error host unreachable
2022-06-25 21:08:21.975 [ERR] WTCL: (anchor) SessionQueue(03feb3ccaf6637fa0c7f1478576b592e17ef7e863d1c3dbcedfe7e92e1e892ef53) unable to dial tower at 0301135932e89600b3582513c648d46213dc425c7666e3380faa7dbb51f7e6a3d6@tower4excc3jdaoxcqzbw7gzipoknzqn3dbnw2kfdfhpvvbxagrzmfad.onion:9911: dial proxy failed: socks connect tcp 172.22.0.2:9050->tower4excc3jdaoxcqzbw7gzipoknzqn3dbnw2kfdfhpvvbxagrzmfad.onion:9911: unknown error host unreachable
2022-06-25 21:08:21.976 [ERR] WTCL: (legacy) SessionQueue(032c18c936e9423fdabe9cb65320479714e5113e7f40617b4a075768ed4dab8ebd) unable to dial tower at 0301135932e89600b3582513c648d46213dc425c7666e3380faa7dbb51f7e6a3d6@tower4excc3jdaoxcqzbw7gzipoknzqn3dbnw2kfdfhpvvbxagrzmfad.onion:9911: dial proxy failed: socks connect tcp 172.22.0.2:9050->tower4excc3jdaoxcqzbw7gzipoknzqn3dbnw2kfdfhpvvbxagrzmfad.onion:9911: unknown error host unreachable
2022-06-25 21:08:21.976 [ERR] WTCL: (legacy) SessionQueue(023bd223b12cf54efadd2e6bfc6d94b449854d10f336a9a367226e89508cd57eea) unable to dial tower at 0301135932e89600b3582513c648d46213dc425c7666e3380faa7dbb51f7e6a3d6@tower4excc3jdaoxcqzbw7gzipoknzqn3dbnw2kfdfhpvvbxagrzmfad.onion:9911: dial proxy failed: socks connect tcp 172.22.0.2:9050->tower4excc3jdaoxcqzbw7gzipoknzqn3dbnw2kfdfhpvvbxagrzmfad.onion:9911: unknown error host unreachable
2022-06-25 21:08:22.026 [ERR] WTCL: (legacy) SessionQueue(02f719fee55366185314ac842660f3199ba56a500c76f1a2e2c37c451e9697ebfb) unable to dial tower at 0301135932e89600b3582513c648d46213dc425c7666e3380faa7dbb51f7e6a3d6@tower4excc3jdaoxcqzbw7gzipoknzqn3dbnw2kfdfhpvvbxagrzmfad.onion:9911: dial proxy failed: socks connect tcp 172.22.0.2:9050->tower4excc3jdaoxcqzbw7gzipoknzqn3dbnw2kfdfhpvvbxagrzmfad.onion:9911: unknown error host unreachable
2022-06-25 21:08:22.027 [ERR] WTCL: (anchor) SessionQueue(026caee2b3c455a79a1921a1e21da46bd7804b76fb686cd66e7cc93889663fe913) unable to dial tower at 0301135932e89600b3582513c648d46213dc425c7666e3380faa7dbb51f7e6a3d6@tower4excc3jdaoxcqzbw7gzipoknzqn3dbnw2kfdfhpvvbxagrzmfad.onion:9911: dial proxy failed: socks connect tcp 172.22.0.2:9050->tower4excc3jdaoxcqzbw7gzipoknzqn3dbnw2kfdfhpvvbxagrzmfad.onion:9911: unknown error host unreachable
2022-06-25 21:08:28.723 [INF] WTCL: (anchor) Queued backup(f57a7852b48a4b9ecffa39602ec77d77b14b78f74053682142e24d37692da1c5, 4382) successfully for session 03f4573b60e91f2fcf7b9132150388714faa9bca7f96a4c1aae0220573cbe1442b

bota87 avatar Jun 25 '22 21:06 bota87

@guggero I'm sorry for tagging you directly out of the blue, yet I'm still waiting on an official reply on this matter: would deleting

towerclientdb_kv
towerserverdb_kv

from a postgres database have the exact same effect as deleting the watchtower.db for boltdb? WIll my funds be safe? I wouldn't like to find surprises, and I have this tower which is not working for me and I can't remove otherwise.

GordianLN avatar Jul 30 '22 17:07 GordianLN

I'm running into this same issue. The problem is that LND keeps using an increasing amount of RAM because the tower is offline. After a few weeks all the RAM+SWAP are completely filled by this memory leak, and the OOM killer kills the LND process.

The simple workaround would be to remove the offline tower, but Im unable to because of "tower has unacked updates".

So the end-result is that Im now stuck with a LND node which has a huge memory leak, and no way to solve it (except for setting wtclient.active=0).

kroese avatar Sep 03 '22 00:09 kroese

Indeed it's basically a catch 22. You should remove the tower because the tower is inoperative, but you can't remove the tower because the tower is inoperative. Wonder what was the logic that led to having LND not allow removing a WT when unacked updates existed, not like abandoning those updates would cause anything bad since you're removing the tower anyway.

GordianLN avatar Sep 04 '22 08:09 GordianLN

Wonder what was the logic that led to having LND not allow removing a WT when unacked updates existed, not like abandoning those updates would cause anything bad since you're removing the tower anyway.

The tower might be useless (decomissioned, broken, ...), but the unacked updates might be extremely important. Instead of dropping them, they should be forwarded to another WT (assuming one exists). In my understanding, that's what @ellemouton is doing in the linked PR.

C-Otto avatar Oct 16 '22 08:10 C-Otto

For now you have to remove after stopping lnd the wtclient.db and restart lnd.

WTs have a few issues : either offline or out of sync : StateUpdateCodeClientBehind in StateUpdateReply for seqnum=1

The above renders them useless , you can't remove them in runtime and causes a memory leak.

Glad to see that WTs that seemed like an absolute afterthought are being paid attention

indomitorum avatar Oct 16 '22 12:10 indomitorum

After a few weeks all the RAM+SWAP are completely filled by this memory leak, and the OOM killer kills the LND process.

@kroese seems that I'm running into the same issue as you except that it happens randomly and sometimes even within a few minutes (16GB RAM). The last log message before my node gets killed (and then restarted by systemd) is always WTCL related:

2022-11-14 11:22:44.662 [ERR] WTCL: (anchor) SessionQueue(0374a7fdae50a0b28ca0a8b501f384fcbd8d18cb488ce28cb9c490c00c314cdfae) unable to dial tower at any available Addresses: dial proxy failed: socks connect tcp 127.0.0.1:9050->iiu4epqzm6cydqhezueenccjlyzrqeruntlzbx47mlmdgfwgtrll66qd.onion:9911: unknown error host unreachable
2022-11-14 11:23:45.126 [INF] LTND: Version: 0.15.99-beta commit=tor/v1.1.0-288-g31a803c93, build=production, logging=default, debuglevel=info,DISC=critical

mariodian avatar Nov 14 '22 04:11 mariodian

@mariodian that's #5983. Either make sure the watchtower is reachable, remove the watchtower (you might need to include the latest patches from master for that, which is a bit tricky), or disable the watchtower client feature. Restarting lnd before the OOM killer strikes might also work.

C-Otto avatar Nov 14 '22 07:11 C-Otto

@C-Otto thanks! I shutdown lnd, removed wtclient.db and added back watchtowers that I know are online most of the time. This seems to be working for now.

mariodian avatar Nov 15 '22 09:11 mariodian