lightning icon indicating copy to clipboard operation
lightning copied to clipboard

autoclean performance

Open michael1011 opened this issue 2 years ago • 9 comments

The autoclean-once command is really slow and uses a lot of memory. Especially the memory consumption is concerning and could become a serious problem for bigger nodes.

image

We are running a CLN node with Postgres database and after a month of operation, I wanted to clean failed forwards from the database, because that data isn't relevant for us.

I ran autoclean-once and that command did take > 1 hour to run and allocated quite a bit of memory that wasn't released after the command was done. Also, it seems like autoclean removes rows from the database either individually or in very small batches, which is not optimal for nodes running with a synchronous replication Postgres database.

michael1011 avatar Oct 04 '23 13:10 michael1011

Can confirm the relatively high memory usage. This is with an sqlite database.

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                          
 448811 nil       20   0 2619072   2.4g      0 S   6.3   3.9   1:29.69 autoclean 

autoclean-once did not take an hour+ in my case, though I do have a beefy machine.

"cleaned": 1802687,

grubles avatar Oct 04 '23 14:10 grubles

though I do have a beefy machine.

So do we. But I think Postgres with sync replication adds lot of latency when you delete one entry, commit and then have to wait for the network to ensure that the replica is in sync. It's not bottlenecked by CPU, RAM or disk.

My personal CLN with SQLite was much faster doing an autoclean

michael1011 avatar Oct 04 '23 17:10 michael1011

Autoclen is @rustyrussell baby, maybe he has an idea on how to speed up this. I can take a look regarding the memory usage btw

vincenzopalazzo avatar Oct 04 '23 19:10 vincenzopalazzo

So, autoclean specifically does delete individually, but also disables transactions opportunistically (they'll still occur if other things need transactions, but if it's back-to-back autoclean they'll be merged). This allows for a much more controlled deletion than if we restricted it to simple database queries, though it's slower.

However, it will happily throw a million simultaneous delete commands at CLN, if it finds that many records! That's going to take a lot of memory. We should stage it into (say) no more than 1000 at once, then wait until those are processed. This will require a rework of the infrastructure inside the plugin, however.

rustyrussell avatar Oct 09 '23 05:10 rustyrussell

This allows for a much more controlled deletion than if we restricted it to simple database queries, though it's slower.

I am not exactly sure what you mean by that. When I tell autoclean to for example delete all failed forwards older than one day, there is no control needed apart from WHERE state = ? AND resolved_time < ?.

However, it will happily throw a million simultaneous delete commands at CLN, if it finds that many records! That's going to take a lot of memory.

This implies that the memory consumption grows linearly with the number of records to delete, which could make it unusable for big nodes with lots of entries that aren't removed regularly

michael1011 avatar Oct 09 '23 10:10 michael1011

This implies that the memory consumption grows linearly with the number of records to delete, which could make it unusable for big nodes with lots of entries that aren't removed regularly

Correct but this is a bug of our implementation, it is not a limitation of the solution. We just should iterate over the entry by batch, so load the first 1k and run the code on it.

I will put it on my todo :)

vincenzopalazzo avatar Oct 09 '23 11:10 vincenzopalazzo

This allows for a much more controlled deletion than if we restricted it to simple database queries, though it's slower.

What criteria are you imagining that you can't code in the WHERE clause of a DELETE statement? Pulling every record from the database table, only to turn around and issue millions of individual DELETE statements, is simply not the way to do it.

which could make it unusable for big nodes with lots of entries that aren't removed regularly

Indeed. I cannot use the autoclean plugin because it OOMs my machine.

whitslack avatar Jun 13 '24 23:06 whitslack

Any updates? @vincenzopalazzo

kilrau avatar Jul 04 '24 09:07 kilrau

Was this fixed by https://github.com/ElementsProject/lightning/pull/7298?

michael1011 avatar Jul 04 '24 09:07 michael1011

Yes, was fixed by #7298 and released in v24.08

michael1011 avatar Sep 12 '24 17:09 michael1011