[bug]: Graph builder takes 30+ minutes with postgres backend
Pre-Submission Checklist
- [x] I have searched the existing issues and believe this is a new bug.
- [x] I am not asking a question about how to use lnd, but reporting a bug (otherwise open a discussion).
LND Version
0.20.0-beta.rc1
LND Configuration
[db] db.no-rev-log-amt-data=true
db.backend=postgres
db.postgres.dsn=postgres://db.postgres.timeout=360s db.use-native-sql=true
db.postgres.maxconnections=50
Backend Version
Bitcoin Core version v28.0
Backend Configuration
N/A
OS/Distribution
Ubuntu 24.04.3 LTS
Bug Details & Steps to Reproduce
On starting LND, it takes 30-45 minutes for LND to become healthy (after wallet password is input). The backend is using postgres and was migrated on v19.3 from bbolt. Initially when I migrated, LND only took 10~ minutes to become healthy, and over the months the restart time has increased.
[INF] GRPH: Builder starting
Expected Behavior
Other users in PLEBNET have mentioned their LND takes minutes to become healthy when running on postgres and it seems my load times are unexpected.
Debug Information
No response
Environment
No response
@AuthenticityBTC - can you provide more info please? what the bar for healthy & what is the indication for the issue being the graph builder? can you maybe provide some more logs? is it that the graph builder is taking long to start? or is it that it is taking long for LND to sync the graph? (those are 2 separate things)
Can you also share your postgres config?
Is your start up time improved if you try out this config? https://gist.github.com/djkazic/526fa3e032aea9578997f88b45b91fb9
`2025-10-23 14:20:21.753 [INF] BTWL: The wallet has been unlocked without a time limit 2025-10-23 14:20:21.884 [INF] CHRE: LightningWallet opened 2025-10-23 14:20:21.889 [INF] SRVR: Onion services are accessible via Tor! NOTE: Traffic to clearnet services is not routed via Tor. rev=90c96c 2025-10-23 14:20:21.889 [INF] TORC: Starting tor controller 2025-10-23 14:20:21.891 [INF] BTCN: Broadcaster now active 2025-10-23 14:20:39.439 [INF] BTWL: Started rescan from block 00000000000000000001607153a3ad45ef0098c32aa65ca802793f41b90a5263 (height 920407) for 1454 addrs, 21 outpoints 2025-10-23 14:20:39.445 [INF] BTWL: Catching up block hashes to height 920407, this might take a while 2025-10-23 14:20:39.445 [INF] BTWL: Done catching up block hashes 2025-10-23 14:20:39.445 [INF] BTWL: Finished rescan for 1454 addresses (synced to block 00000000000000000001607153a3ad45ef0098c32aa65ca802793f41b90a5263, height 920407) 2025-10-23 14:20:39.452 [INF] BTWL: 2b3b4e210882df5659b851e7c88d0057ac353aed2efa5aebdae1c8e757199c93: tx already confirmed
2025-10-23 14:21:01.626 [INF] ACSM: Access Manager initialized 2025-10-23 14:21:02.479 [INF] LTND: Channel backup proxy channel notifier starting 2025-10-23 14:21:02.479 [INF] CHIO: Registered queue=1 with 3 blockbeat consumers
2025-10-23 14:21:02.480 [INF] LTND: We're not running within systemd or the service type is not 'notify' 2025-10-23 14:21:02.481 [INF] LTND: Waiting for chain backend to finish sync rev=90c96c start_height=920407 2025-10-23 14:21:02.482 [INF] LTND: Chain backend is fully synced! rev=90c96c end_height=920407 2025-10-23 14:21:02.482 [INF] SRVR: Received initial block 00000000000000000001607153a3ad45ef0098c32aa65ca802793f41b90a5263 at height 920407 2025-10-23 14:21:02.482 [WRN] HLCK: check: disk space configured with 0 attempts, skipping it 2025-10-23 14:21:02.482 [WRN] HLCK: check: tls configured with 0 attempts, skipping it 2025-10-23 14:21:02.482 [WRN] HLCK: check: tor connection configured with 0 attempts, skipping it 2025-10-23 14:21:02.482 [INF] LNWL: SigPool starting 2025-10-23 14:21:02.482 [INF] CHNF: ChannelNotifier starting 2025-10-23 14:21:02.482 [INF] PRNF: PeerNotifier starting 2025-10-23 14:21:02.482 [INF] SWPR: TxPublisher starting... 2025-10-23 14:21:02.482 [INF] SWPR: Sweeper starting 2025-10-23 14:21:02.482 [INF] UTXN: UTXO nursery starting 2025-10-23 14:21:02.599 [INF] BRAR: Breach arbiter starting 2025-10-23 14:21:02.715 [INF] FNDG: Funding manager starting 2025-10-23 14:21:02.715 [INF] BRAR: Starting contract observer, watching for breaches. 2025-10-23 14:21:23.305 [INF] INVC: HtlcModificationInterceptor starting... ... 2025-10-23 14:21:39.241 [INF] GRPH: Builder starting ... 2025-10-23 14:52:35.416 [INF] DISC: Authenticated Gossiper starting
2025-10-23 14:52:36.299 [INF] NANN: Channel Status Manager starting 2025-10-23 14:52:36.970 [INF] DISC: Retransmitting 7 outgoing channels 2025-10-23 14:52:49.852 [INF] CHFT: ChannelEventStore starting... 2025-10-23 14:52:50.734 [INF] CHFT: Adding 211 channels to event store 2025-10-23 14:52:52.192 [INF] CHBU: chanbackup.SubSwapper starting
...
2025-10-23 14:52:52.310 [INF] CHBU: Updating backup file at 2025-10-23 14:52:52.314 [INF] CHBU: Swapping old multi backup file from 2025-10-23 14:52:52.314 [INF] CHBU: Archiving old channel backup to 2025-10-23 14:52:52.496 [INF] SRVR: connMgr starting... 2025-10-23 14:52:52.496 [INF] CMGR: Server listening on [::]:9735`
Logs from my last restart on Oct 23, 2025. I removed a number of entries to make this more compact (Replacing disk backup for channel point, sweeps, etc
When I did my initial migration on 19.3, I followed the tuning config you linked. I'll double check/verify that I didn't miss any settings (and attempt a restart if I did)
Can you also share your postgres config?
I have postgres running on AWS RDS. I'll research to see if I can export the settings and reply
@AuthenticityBTC , can you turn on debug logs and start again? i think the GRPH: Builder starting being the last log before the delay is a trojan horse cause it is an info level bug.
@AuthenticityBTC , can you turn on debug logs and start again? i think the
GRPH: Builder startingbeing the last log before the delay is a trojan horse cause it is an info level bug.
Running in debug mode now and will have logs ready shortly. Is there an email / private way to share them (not sure if any sensitive data is inside)?
I am seeing the following, which seems a bit concerning to me. I started LND at 03:26:x.x and at 03:28:x.x I started to see PYDB: Scanning in flight payments. The PYDB finished at 03:57:x.x. The node becomes healthy at 04:03:x.x
[DBG] PYDB: Scanning inflight payments (in progress), processed 254802, last processed payment: payment_id=xx, amount=xx mSAT, created_at=2025-08-09 19:50:17.339593195 +0000 UTC
[DBG] PYDB: Scanning inflight payments (in progress), processed 307933, last processed payment: payment_id=xx, amount=xx mSAT, created_at=2025-08-19 07:15:49.944526481 +0000 UTC
I have not had time to look deeper into Postgres yet unfortunately.
I have postgres running on AWS RDS.
Scanning the in-flight payments can take a long time, which is known. Can you say anything concerning network communication times? This may add a bottleneck and can lead to even longer loading times for inflight payments and could lead to the problems you see in the other issues as well.
Is your start up time improved if you try out this config? https://gist.github.com/djkazic/526fa3e032aea9578997f88b45b91fb9
My postgres settings are the same as in here. The only deviations are for:
work_mem = 8MB (16 in the gist) shared_buffers = 1000MB - AWS default for my DB size (6000MB in the gist)
I have postgres running on AWS RDS.
Scanning the in-flight payments can take a long time, which is known. Can you say anything concerning network communication times? This may add a bottleneck and can lead to even longer loading times for inflight payments and could lead to the problems you see in the other issues as well.
Both the LND and Postgres servers are running in the same AZ (essentially same datacenter) so it's sub 1ms latency
I doubled the size of my DB server a few days ago and did a few more test restarts and still am having the same issue with long startup times.
This is a known issue and will be fixed with payments being in native sql.