lnd
lnd copied to clipboard
Trying to remove tower from client causes rpc error: code = Unknown desc = tower has unacked updates
Background
Trying to remove a tower wtclient but it gives me an issue
Your environment
- version of
lnd: lnd version 0.10.99-beta commit=clock/v1.0.0-106-gc1ef5bb908606343d2636c8cd345169e064bdc91 - which operating system (
uname -aon *Nix) : ubuntu (in container) - version of
btcd,bitcoind, or other backend bitcoins 0.19.1 - any other relevant environment details
Steps to reproduce
I have a wtclient setup to the wrong address, in fact in the logs it is telling me it can’t dial. I want to remove it but it doesn’t let me
bash-5.0# lncli --macaroonpath /lnd/chain/bitcoin/testnet/admin.macaroon --tlscertpath /shared/tls.cert wtclient remove 038f7f36689b9d7274702f5bce3a5d8bc4596d4894d9985c5203604fff4daef425
[lncli] rpc error: code = Unknown desc = tower has unacked updates
this is what the towers command retuns
"towers": [
{
"pubkey": "038f7f36689b9d7274702f5bce3a5d8bc4596d4894d9985c5203604fff4daef425",
"addresses": [
"157.245.68.69:9911"
],
"active_session_candidate": true,
"num_sessions": 1,
"sessions": [
]
},
Expected behaviour
Allow me to remove a tower or allow some way to FORCE removal
Actual behaviour
It won't let me, please see the issue above.
I believe removing a tower when a backup hasn't been fully processed isn't safe at the moment because it won't be replayed to any other existing towers cc @cfromknecht.
I have a wtclient setup to the wrong address, in fact in the logs it is telling me it can’t dial. I want to remove it but it doesn’t let me
If you just want to modify the address, then you can specify it in the lncli wtclient remove command, and add the new one with the lncli wtclient add command.
Hi @wpaulino thanks for the response.
That is true, I did want to modify the address but I wasn't aware I could.
I have just tried the following but it did not work
lncli --macaroonpath /lnd/chain/bitcoin/testnet/admin.macaroon --tlscertpath /shared/tls.cert wtclient remove 038f7f36689b9d7274702f5bce3a5d8bc4596d4894d9985c5203604fff4daef425 021f6fddf84ccaf1a87c99634770e2b7fb25eac890f8f4e5501abbf1a60b25d4fc
so the address starting with 038 is what I wish to remove OR change with the address starting with 02a
I can easily add a new one which I can do, but removing OR changing the one mentioned above doesn't work.
Maybe I have the syntax of "wtclient remove" wrong
Can you confirm ?
Thanks
Ah, I thought you were referring to a network address. If you want to modify the public key, then you had the correct command the first time which resulted in the error (lncli wtclient remove PUBLIC_KEY). Are you able to add the new tower without needing to remove the stale one?
@wpaulino sure - that's what I have done, so I have a new watchtower client connecting to the correct URL. All good.
But the old one still persists and although it's not causing any issues, it does output errors of WTC into the logs.
I just thought it would be possible to force remove it.
I think this has bene resolved?
The root issue hasn't, which is replaying any pending backups to new towers. That would make the removal of a stale tower possible.
Is there any way to force remove a watchtower? I've got pending backups that won't go through, so I get the same error here. In the meantime I've added a new watchtower but I still see RPC log spam from the unremovable old WT.
FWIW when I added a new watchtower it uploaded backup states without an issue. I suspect the underlying issue is that when the channel.db was corrupted by a power failure, I restored the most up to date version of it that wasn't corrupt. Could this "travel back in time" cause this broken state for watchtowers?
You should never restore the channel.db file from a backup! You'll not only put your funds at risk but cause all sorts of problems. The watchtower not being able to send backups just being one of them.
I didn't want to lose all my channels. I saved 12 of them by doing so, and some others force closed.
In addition, in this case I confirmed no payments had been routed from the time of the backup to the time of corruption.
Considering that, I don't think there was substantial risk versus guaranteed closing all channels (thus having to pay two on chain tx fees -- one to close and one to reopen after restoring from an SCB).
I have the same issue. One of the watchtower I use has gone offline and I keep getting errors trying to upload to it.
lncli wtclient remove
2022-01-03 16:11:38.819 [ERR] RPCS: [/wtclientrpc.WatchtowerClient/RemoveTower]: tower has unacked updates
So does it mean that until the tower is online again I will not be able to remove it (and it will keep trying to dial the tower until then)? I noted there was a case of memory leak in case of off-line tower backing queuing up by @C-Otto
I also have this issue with unacked updates. Aside from not being able to remove the unsynched tower to silence its log entries I don't seem to be able to add a new tower. It adds the new tower and shows the new one in lncli wtclient towers but it never creates an active session to the new tower. Since I can't take advantage of watch tower protection anymore I am just going to set wtclient.active=0 in my config. Is there any other solution?
I'm eaven having the same error, the weird thing is that my it's even online (first and last line on this logs shows successfull backups)
2022-06-25 21:08:03.502 [INF] WTCL: (anchor) Queued backup(f57a7852b48a4b9ecffa39602ec77d77b14b78f74053682142e24d37692da1c5, 4381) successfully for session 03f4573b60e91f2fcf7b9132150388714faa9bca7f96a4c1aae0220573cbe1442b
2022-06-25 21:08:07.096 [INF] WTCL: (legacy) Client stats: tasks(received=0 accepted=35722 ineligible=0) sessions(acquired=0 exhausted=78)
2022-06-25 21:08:09.187 [INF] WTCL: (anchor) Client stats: tasks(received=0 accepted=31402 ineligible=0) sessions(acquired=0 exhausted=65)
2022-06-25 21:08:21.971 [ERR] WTCL: (legacy) SessionQueue(02f13ecb0207727944100f0ec09642983e5d40a3058cc615b6a55dab760d3b809d) unable to dial tower at 0301135932e89600b3582513c648d46213dc425c7666e3380faa7dbb51f7e6a3d6@tower4excc3jdaoxcqzbw7gzipoknzqn3dbnw2kfdfhpvvbxagrzmfad.onion:9911: dial proxy failed: socks connect tcp 172.22.0.2:9050->tower4excc3jdaoxcqzbw7gzipoknzqn3dbnw2kfdfhpvvbxagrzmfad.onion:9911: unknown error host unreachable
2022-06-25 21:08:21.971 [ERR] WTCL: (legacy) SessionQueue(03720b216fa0ad2b9cb9a9743d0c5f5c4c503f6853fb8c0a2e406b41ef0f8dd40c) unable to dial tower at 0301135932e89600b3582513c648d46213dc425c7666e3380faa7dbb51f7e6a3d6@tower4excc3jdaoxcqzbw7gzipoknzqn3dbnw2kfdfhpvvbxagrzmfad.onion:9911: dial proxy failed: socks connect tcp 172.22.0.2:9050->tower4excc3jdaoxcqzbw7gzipoknzqn3dbnw2kfdfhpvvbxagrzmfad.onion:9911: unknown error host unreachable
2022-06-25 21:08:21.975 [ERR] WTCL: (anchor) SessionQueue(02ce2976a7a557c799e273c881d258e84e48bc9af40b9c44f94ee53edb08e68fe4) unable to dial tower at 0301135932e89600b3582513c648d46213dc425c7666e3380faa7dbb51f7e6a3d6@tower4excc3jdaoxcqzbw7gzipoknzqn3dbnw2kfdfhpvvbxagrzmfad.onion:9911: dial proxy failed: socks connect tcp 172.22.0.2:9050->tower4excc3jdaoxcqzbw7gzipoknzqn3dbnw2kfdfhpvvbxagrzmfad.onion:9911: unknown error host unreachable
2022-06-25 21:08:21.975 [ERR] WTCL: (anchor) SessionQueue(03feb3ccaf6637fa0c7f1478576b592e17ef7e863d1c3dbcedfe7e92e1e892ef53) unable to dial tower at 0301135932e89600b3582513c648d46213dc425c7666e3380faa7dbb51f7e6a3d6@tower4excc3jdaoxcqzbw7gzipoknzqn3dbnw2kfdfhpvvbxagrzmfad.onion:9911: dial proxy failed: socks connect tcp 172.22.0.2:9050->tower4excc3jdaoxcqzbw7gzipoknzqn3dbnw2kfdfhpvvbxagrzmfad.onion:9911: unknown error host unreachable
2022-06-25 21:08:21.976 [ERR] WTCL: (legacy) SessionQueue(032c18c936e9423fdabe9cb65320479714e5113e7f40617b4a075768ed4dab8ebd) unable to dial tower at 0301135932e89600b3582513c648d46213dc425c7666e3380faa7dbb51f7e6a3d6@tower4excc3jdaoxcqzbw7gzipoknzqn3dbnw2kfdfhpvvbxagrzmfad.onion:9911: dial proxy failed: socks connect tcp 172.22.0.2:9050->tower4excc3jdaoxcqzbw7gzipoknzqn3dbnw2kfdfhpvvbxagrzmfad.onion:9911: unknown error host unreachable
2022-06-25 21:08:21.976 [ERR] WTCL: (legacy) SessionQueue(023bd223b12cf54efadd2e6bfc6d94b449854d10f336a9a367226e89508cd57eea) unable to dial tower at 0301135932e89600b3582513c648d46213dc425c7666e3380faa7dbb51f7e6a3d6@tower4excc3jdaoxcqzbw7gzipoknzqn3dbnw2kfdfhpvvbxagrzmfad.onion:9911: dial proxy failed: socks connect tcp 172.22.0.2:9050->tower4excc3jdaoxcqzbw7gzipoknzqn3dbnw2kfdfhpvvbxagrzmfad.onion:9911: unknown error host unreachable
2022-06-25 21:08:22.026 [ERR] WTCL: (legacy) SessionQueue(02f719fee55366185314ac842660f3199ba56a500c76f1a2e2c37c451e9697ebfb) unable to dial tower at 0301135932e89600b3582513c648d46213dc425c7666e3380faa7dbb51f7e6a3d6@tower4excc3jdaoxcqzbw7gzipoknzqn3dbnw2kfdfhpvvbxagrzmfad.onion:9911: dial proxy failed: socks connect tcp 172.22.0.2:9050->tower4excc3jdaoxcqzbw7gzipoknzqn3dbnw2kfdfhpvvbxagrzmfad.onion:9911: unknown error host unreachable
2022-06-25 21:08:22.027 [ERR] WTCL: (anchor) SessionQueue(026caee2b3c455a79a1921a1e21da46bd7804b76fb686cd66e7cc93889663fe913) unable to dial tower at 0301135932e89600b3582513c648d46213dc425c7666e3380faa7dbb51f7e6a3d6@tower4excc3jdaoxcqzbw7gzipoknzqn3dbnw2kfdfhpvvbxagrzmfad.onion:9911: dial proxy failed: socks connect tcp 172.22.0.2:9050->tower4excc3jdaoxcqzbw7gzipoknzqn3dbnw2kfdfhpvvbxagrzmfad.onion:9911: unknown error host unreachable
2022-06-25 21:08:28.723 [INF] WTCL: (anchor) Queued backup(f57a7852b48a4b9ecffa39602ec77d77b14b78f74053682142e24d37692da1c5, 4382) successfully for session 03f4573b60e91f2fcf7b9132150388714faa9bca7f96a4c1aae0220573cbe1442b
@guggero I'm sorry for tagging you directly out of the blue, yet I'm still waiting on an official reply on this matter: would deleting
towerclientdb_kv
towerserverdb_kv
from a postgres database have the exact same effect as deleting the watchtower.db for boltdb? WIll my funds be safe?
I wouldn't like to find surprises, and I have this tower which is not working for me and I can't remove otherwise.
I'm running into this same issue. The problem is that LND keeps using an increasing amount of RAM because the tower is offline. After a few weeks all the RAM+SWAP are completely filled by this memory leak, and the OOM killer kills the LND process.
The simple workaround would be to remove the offline tower, but Im unable to because of "tower has unacked updates".
So the end-result is that Im now stuck with a LND node which has a huge memory leak, and no way to solve it (except for setting wtclient.active=0).
Indeed it's basically a catch 22. You should remove the tower because the tower is inoperative, but you can't remove the tower because the tower is inoperative. Wonder what was the logic that led to having LND not allow removing a WT when unacked updates existed, not like abandoning those updates would cause anything bad since you're removing the tower anyway.
Wonder what was the logic that led to having LND not allow removing a WT when unacked updates existed, not like abandoning those updates would cause anything bad since you're removing the tower anyway.
The tower might be useless (decomissioned, broken, ...), but the unacked updates might be extremely important. Instead of dropping them, they should be forwarded to another WT (assuming one exists). In my understanding, that's what @ellemouton is doing in the linked PR.
For now you have to remove after stopping lnd the wtclient.db and restart lnd.
WTs have a few issues : either offline or out of sync : StateUpdateCodeClientBehind in StateUpdateReply for seqnum=1
The above renders them useless , you can't remove them in runtime and causes a memory leak.
Glad to see that WTs that seemed like an absolute afterthought are being paid attention
After a few weeks all the RAM+SWAP are completely filled by this memory leak, and the OOM killer kills the LND process.
@kroese seems that I'm running into the same issue as you except that it happens randomly and sometimes even within a few minutes (16GB RAM). The last log message before my node gets killed (and then restarted by systemd) is always WTCL related:
2022-11-14 11:22:44.662 [ERR] WTCL: (anchor) SessionQueue(0374a7fdae50a0b28ca0a8b501f384fcbd8d18cb488ce28cb9c490c00c314cdfae) unable to dial tower at any available Addresses: dial proxy failed: socks connect tcp 127.0.0.1:9050->iiu4epqzm6cydqhezueenccjlyzrqeruntlzbx47mlmdgfwgtrll66qd.onion:9911: unknown error host unreachable
2022-11-14 11:23:45.126 [INF] LTND: Version: 0.15.99-beta commit=tor/v1.1.0-288-g31a803c93, build=production, logging=default, debuglevel=info,DISC=critical
@mariodian that's #5983. Either make sure the watchtower is reachable, remove the watchtower (you might need to include the latest patches from master for that, which is a bit tricky), or disable the watchtower client feature. Restarting lnd before the OOM killer strikes might also work.
@C-Otto thanks! I shutdown lnd, removed wtclient.db and added back watchtowers that I know are online most of the time. This seems to be working for now.