clboss icon indicating copy to clipboard operation
clboss copied to clipboard

slower payments with clboss running

Open marimes opened this issue 1 year ago • 10 comments

I am running node 02d695b01c7a6909e716c863fb39bc5fb7bbdc3824b7fdce53adc593e5be080e73 and have re-started clboss after several months of running the node without it. Clboss has been probing the network for three days straight and keeping topology at 100% CPU load. I also notice that payments (small and large) take significantly longer (20-30 s) than without clboss running (2-10s). Is this expected behavior, or must I change anything in my configuration?

marimes avatar Nov 27 '24 07:11 marimes

What version of CLBOSS are you running?

lightning-cli clboss-status

chrisguida avatar Nov 27 '24 18:11 chrisguida

I am running clboss version v0.14.1-rc2 over lightning version v24.08-modded.

marimes avatar Nov 27 '24 22:11 marimes

I've noticed topology being very busy (~90-100% CPU) on my node as well. My node has two cores, so one process being 100% consumes half the total available CPU ...

How many cores does your system have?

ksedgwic avatar Nov 27 '24 23:11 ksedgwic

The VM has four cores. I am okay with having one core at 100% as long as it's productive (I assume there is a lot to measure for 200 channels). The problem is the responsiveness and waiting 20-30 seconds for a payment to clear, which usually takes only a few seconds.

marimes avatar Nov 28 '24 06:11 marimes

I totally agree, I was making sure we weren't simply starving a single CPU ...

ksedgwic avatar Dec 02 '24 16:12 ksedgwic

Another resource I've had to tune is memory ... any chance your system is paging?

I run sar -W 60 in a window to make sure pswpin/s isn't more than single digits ...

ksedgwic avatar Dec 02 '24 17:12 ksedgwic

Thank you, Ken, for looking into this.

I have to walk back my claim that clboss is slowing payments down. After I restarted clboss, I now get comparable performances ("tested" by zapping notes of the same nostr account). Also, "topology" does not run at 100% but at around 35%. I am trying to understand why it is different this time, and I will check again with clboss running for at least 24 hours. If it stays like this, I will be pretty happy.

24 hours later: still everything working fine, topology at < 50%, no paging, payments without delay

The sar measurements with and without clboss are not significantly different. Without clboss, it is mostly 0.00 and sometimes non-zero, peaking at 0.28.

Another observation that may or may not be relevant is that this time, compared to the last time I wrote, ChannelFinderByPopularity seems to be making real progress [plugin-clboss: ChannelFinderByPopularity: Progress: 10817 / 17348 (0.623530)]. Before, every time I checked, it seemed just to have started working on the first percent.

Here is my sar output with clboss running and with all 0.00 measurements deleted:

marius@mlbb2:~$ sar -W 60
Linux 5.15.0-124-generic (mlbb2)        12/03/2024      _x86_64_        (4 CPU)

08:17:14 AM  pswpin/s pswpout/s
08:18:14 AM      0.07      0.00
08:21:14 AM      0.03      0.00
08:24:14 AM      0.07      0.00
08:25:14 AM      0.03      0.00
08:29:14 AM      0.02      0.00
08:30:14 AM      0.22      0.00
08:34:14 AM      0.20      0.00
08:40:14 AM      0.28      0.00

marimes avatar Dec 03 '24 10:12 marimes

Your paging looks good. Hmm.

I need to learn why CLBOSS is so topology heavy sometimes.

Also, CLN is improving pay and xpay in the v24.11 release; need to understand if CLBOSS should/could use askrene ...

ksedgwic avatar Dec 04 '24 16:12 ksedgwic

@ksedgwic it looks like CLN can be set to have xpay take over for pay by setting the xpay-handle-pay config option: https://github.com/ElementsProject/lightning/releases/tag/v24.11rc2

chrisguida avatar Dec 04 '24 21:12 chrisguida

Cool. On the command line, I have lately been using renepay. Is there a good article that explains the difference between these new options?

marimes avatar Dec 09 '24 09:12 marimes