lnd icon indicating copy to clipboard operation
lnd copied to clipboard

[bug]: Graph builder takes 30+ minutes with postgres backend

Open AuthenticityBTC opened this issue 2 months ago • 9 comments

Pre-Submission Checklist

  • [x] I have searched the existing issues and believe this is a new bug.
  • [x] I am not asking a question about how to use lnd, but reporting a bug (otherwise open a discussion).

LND Version

0.20.0-beta.rc1

LND Configuration

[db] db.no-rev-log-amt-data=true

db.backend=postgres

db.postgres.dsn=postgres://db.postgres.timeout=360s db.use-native-sql=true

db.postgres.maxconnections=50

Backend Version

Bitcoin Core version v28.0

Backend Configuration

N/A

OS/Distribution

Ubuntu 24.04.3 LTS

Bug Details & Steps to Reproduce

On starting LND, it takes 30-45 minutes for LND to become healthy (after wallet password is input). The backend is using postgres and was migrated on v19.3 from bbolt. Initially when I migrated, LND only took 10~ minutes to become healthy, and over the months the restart time has increased.

[INF] GRPH: Builder starting

Expected Behavior

Other users in PLEBNET have mentioned their LND takes minutes to become healthy when running on postgres and it seems my load times are unexpected.

Debug Information

No response

Environment

No response

AuthenticityBTC avatar Oct 24 '25 16:10 AuthenticityBTC

@AuthenticityBTC - can you provide more info please? what the bar for healthy & what is the indication for the issue being the graph builder? can you maybe provide some more logs? is it that the graph builder is taking long to start? or is it that it is taking long for LND to sync the graph? (those are 2 separate things)

ellemouton avatar Oct 24 '25 16:10 ellemouton

Can you also share your postgres config?

Roasbeef avatar Oct 24 '25 17:10 Roasbeef

Is your start up time improved if you try out this config? https://gist.github.com/djkazic/526fa3e032aea9578997f88b45b91fb9

Roasbeef avatar Oct 24 '25 17:10 Roasbeef

`2025-10-23 14:20:21.753 [INF] BTWL: The wallet has been unlocked without a time limit 2025-10-23 14:20:21.884 [INF] CHRE: LightningWallet opened 2025-10-23 14:20:21.889 [INF] SRVR: Onion services are accessible via Tor! NOTE: Traffic to clearnet services is not routed via Tor. rev=90c96c 2025-10-23 14:20:21.889 [INF] TORC: Starting tor controller 2025-10-23 14:20:21.891 [INF] BTCN: Broadcaster now active 2025-10-23 14:20:39.439 [INF] BTWL: Started rescan from block 00000000000000000001607153a3ad45ef0098c32aa65ca802793f41b90a5263 (height 920407) for 1454 addrs, 21 outpoints 2025-10-23 14:20:39.445 [INF] BTWL: Catching up block hashes to height 920407, this might take a while 2025-10-23 14:20:39.445 [INF] BTWL: Done catching up block hashes 2025-10-23 14:20:39.445 [INF] BTWL: Finished rescan for 1454 addresses (synced to block 00000000000000000001607153a3ad45ef0098c32aa65ca802793f41b90a5263, height 920407) 2025-10-23 14:20:39.452 [INF] BTWL: 2b3b4e210882df5659b851e7c88d0057ac353aed2efa5aebdae1c8e757199c93: tx already confirmed

2025-10-23 14:21:01.626 [INF] ACSM: Access Manager initialized 2025-10-23 14:21:02.479 [INF] LTND: Channel backup proxy channel notifier starting 2025-10-23 14:21:02.479 [INF] CHIO: Registered queue=1 with 3 blockbeat consumers

2025-10-23 14:21:02.480 [INF] LTND: We're not running within systemd or the service type is not 'notify' 2025-10-23 14:21:02.481 [INF] LTND: Waiting for chain backend to finish sync rev=90c96c start_height=920407 2025-10-23 14:21:02.482 [INF] LTND: Chain backend is fully synced! rev=90c96c end_height=920407 2025-10-23 14:21:02.482 [INF] SRVR: Received initial block 00000000000000000001607153a3ad45ef0098c32aa65ca802793f41b90a5263 at height 920407 2025-10-23 14:21:02.482 [WRN] HLCK: check: disk space configured with 0 attempts, skipping it 2025-10-23 14:21:02.482 [WRN] HLCK: check: tls configured with 0 attempts, skipping it 2025-10-23 14:21:02.482 [WRN] HLCK: check: tor connection configured with 0 attempts, skipping it 2025-10-23 14:21:02.482 [INF] LNWL: SigPool starting 2025-10-23 14:21:02.482 [INF] CHNF: ChannelNotifier starting 2025-10-23 14:21:02.482 [INF] PRNF: PeerNotifier starting 2025-10-23 14:21:02.482 [INF] SWPR: TxPublisher starting... 2025-10-23 14:21:02.482 [INF] SWPR: Sweeper starting 2025-10-23 14:21:02.482 [INF] UTXN: UTXO nursery starting 2025-10-23 14:21:02.599 [INF] BRAR: Breach arbiter starting 2025-10-23 14:21:02.715 [INF] FNDG: Funding manager starting 2025-10-23 14:21:02.715 [INF] BRAR: Starting contract observer, watching for breaches. 2025-10-23 14:21:23.305 [INF] INVC: HtlcModificationInterceptor starting... ... 2025-10-23 14:21:39.241 [INF] GRPH: Builder starting ... 2025-10-23 14:52:35.416 [INF] DISC: Authenticated Gossiper starting

2025-10-23 14:52:36.299 [INF] NANN: Channel Status Manager starting 2025-10-23 14:52:36.970 [INF] DISC: Retransmitting 7 outgoing channels 2025-10-23 14:52:49.852 [INF] CHFT: ChannelEventStore starting... 2025-10-23 14:52:50.734 [INF] CHFT: Adding 211 channels to event store 2025-10-23 14:52:52.192 [INF] CHBU: chanbackup.SubSwapper starting

...

2025-10-23 14:52:52.310 [INF] CHBU: Updating backup file at 2025-10-23 14:52:52.314 [INF] CHBU: Swapping old multi backup file from 2025-10-23 14:52:52.314 [INF] CHBU: Archiving old channel backup to 2025-10-23 14:52:52.496 [INF] SRVR: connMgr starting... 2025-10-23 14:52:52.496 [INF] CMGR: Server listening on [::]:9735`

Logs from my last restart on Oct 23, 2025. I removed a number of entries to make this more compact (Replacing disk backup for channel point, sweeps, etc

When I did my initial migration on 19.3, I followed the tuning config you linked. I'll double check/verify that I didn't miss any settings (and attempt a restart if I did)

Can you also share your postgres config?

I have postgres running on AWS RDS. I'll research to see if I can export the settings and reply

AuthenticityBTC avatar Oct 24 '25 17:10 AuthenticityBTC

@AuthenticityBTC , can you turn on debug logs and start again? i think the GRPH: Builder starting being the last log before the delay is a trojan horse cause it is an info level bug.

ellemouton avatar Oct 28 '25 16:10 ellemouton

@AuthenticityBTC , can you turn on debug logs and start again? i think the GRPH: Builder starting being the last log before the delay is a trojan horse cause it is an info level bug.

Running in debug mode now and will have logs ready shortly. Is there an email / private way to share them (not sure if any sensitive data is inside)?

I am seeing the following, which seems a bit concerning to me. I started LND at 03:26:x.x and at 03:28:x.x I started to see PYDB: Scanning in flight payments. The PYDB finished at 03:57:x.x. The node becomes healthy at 04:03:x.x

[DBG] PYDB: Scanning inflight payments (in progress), processed 254802, last processed payment: payment_id=xx, amount=xx mSAT, created_at=2025-08-09 19:50:17.339593195 +0000 UTC

[DBG] PYDB: Scanning inflight payments (in progress), processed 307933, last processed payment: payment_id=xx, amount=xx mSAT, created_at=2025-08-19 07:15:49.944526481 +0000 UTC

I have not had time to look deeper into Postgres yet unfortunately.

AuthenticityBTC avatar Oct 31 '25 04:10 AuthenticityBTC

I have postgres running on AWS RDS.

Scanning the in-flight payments can take a long time, which is known. Can you say anything concerning network communication times? This may add a bottleneck and can lead to even longer loading times for inflight payments and could lead to the problems you see in the other issues as well.

bitromortac avatar Nov 03 '25 10:11 bitromortac

Is your start up time improved if you try out this config? https://gist.github.com/djkazic/526fa3e032aea9578997f88b45b91fb9

My postgres settings are the same as in here. The only deviations are for:

work_mem = 8MB (16 in the gist) shared_buffers = 1000MB - AWS default for my DB size (6000MB in the gist)

I have postgres running on AWS RDS.

Scanning the in-flight payments can take a long time, which is known. Can you say anything concerning network communication times? This may add a bottleneck and can lead to even longer loading times for inflight payments and could lead to the problems you see in the other issues as well.

Both the LND and Postgres servers are running in the same AZ (essentially same datacenter) so it's sub 1ms latency

I doubled the size of my DB server a few days ago and did a few more test restarts and still am having the same issue with long startup times.

AuthenticityBTC avatar Nov 06 '25 18:11 AuthenticityBTC

This is a known issue and will be fixed with payments being in native sql.

ziggie1984 avatar Nov 07 '25 08:11 ziggie1984