cardano-db-sync
cardano-db-sync copied to clipboard
DB-SYNC doesnt move
db-sync 13.5.0.2 , pg 14 , is stuck here and not moving. Eventually it fails and goes back to same stage. Higher work_mem is also set in pg.
[db-sync-node:Warning:81] [2024-10-18 11:25:58.77 UTC] Creating Indexes. This may require an extended period of time to perform. Setting a higher maintenance_work_mem from Postgres usually speeds up this process. These indexes are not used by db-sync but are meant for clients. If you want to skip some of these indexes, you can stop db-sync, delete or modify any migration-4-* files in the schema directory and restart it
error
[db-sync-node:Error:81] [2024-10-18 12:58:12.34 UTC] runDBThread: SqlError {sqlState = "", sqlExecStatus = FatalError, sqlErrorMsg = "", sqlErrorDetail = "", sqlErrorHint = ""}
please help on this.
After you get the error, are the indices still being created? Try running this query:
select * from pg_stat_progress_create_index
Also, can you check if postgresql and cardano-node are still running at this point? Also if you could give more information about your environment, that would be helpful:
- Are you using db-sync in docker or natively?
- How are you running postgresql (OS package? command line? systemd?)
Sorry i should have given more details earlier.
I am pretty confused with the nature of issues we have .
we are runnning both db-sync and pg in docker.
i ran the select query you gave , it didnt return anything.
I notice the strange behaviour. Sometimes it gives the error after waiting at this point
[db-sync-node:Info:6] [2024-10-21 11:05:09.83 UTC] Found maintenance_work_mem=2GB, max_parallel_maintenance_workers=4 ExitFailure 2
Errors in file: /tmp/migrate-2024-10-21T11:05:09.835467022Z.log
sometimes it gives the error that i mentioned earlier.
After throwing this error, container restarts and starting syncing again. I can see its waiting here now
[db-sync-node:Info:81] [2024-10-21 12:41:50.32 UTC] Received block which is not in the db with HeaderFields {headerFieldSlot = SlotNo 137938707, headerFieldBlockNo = BlockNo 10990914, headerFieldHash = a544cd2f7bf24902ac5d9b0f674f67b02f46254b82fe8a6fafa58758f7956fba}. Time to restore consistency. [db-sync-node:Info:81] [2024-10-21 12:41:50.32 UTC] Starting at epoch 516
I think it will error out after this, i am watching it waits
This message:
Errors in file: /tmp/migrate-2024-10-21T11:05:09.835467022Z.log
Indicates there is a problem running a migration, which will cause db-sync to exit. Can you post the contents of that file?
I kept losing that file as container restarts. I am tailing the file right now, it doesnt show any messages yet.
just now it crashed like this
[db-sync-node:Info:81] [2024-10-21 12:41:50.32 UTC] Starting at epoch 516
[db-sync-node:Error:81] [2024-10-21 14:41:53.13 UTC] runDBThread: libpq: failed (no connection to the server ) [db-sync-node:Error:111] [2024-10-21 14:41:53.13 UTC] recvMsgRollForward: AsyncCancelled [db-sync-node:Error:106] [2024-10-21 14:41:53.13 UTC] ChainSyncWithBlocksPtcl: AsyncCancelled [db-sync-node.Subscription:Error:102] [2024-10-21 14:41:53.13 UTC] Identity Application Exception: LocalAddress "/home/cardano/ipc/node.socket" SubscriberError {seType = SubscriberWorkerCancelled, seMessage = "SubscriptionWorker exiting", seStack = []} cardano-db-sync: libpq: failed (no connection to the server )
this is from the logs ,
Running : migration-1-0000-20190730.sql init
(1 row)
Running : migration-1-0001-20190730.sql migrate
(1 row)
Running : migration-1-0002-20190912.sql psql:/home/cardano/cardano-db-sync/schema/migration-1-0002-20190912.sql:32: NOTICE: Dropping view : "utxo_byron_view" psql:/home/cardano/cardano-db-sync/schema/migration-1-0002-20190912.sql:32: NOTICE: Dropping view : "utxo_view" drop_cexplorer_views
(1 row)
Running : migration-1-0003-20200211.sql migrate
(1 row)
Running : migration-1-0004-20201026.sql migrate
(1 row)
Running : migration-1-0005-20210311.sql migrate
(1 row)
Running : migration-1-0006-20210531.sql migrate
(1 row)
Running : migration-1-0007-20210611.sql migrate
(1 row)
Running : migration-1-0008-20210727.sql migrate
(1 row)
Running : migration-1-0009-20210727.sql migrate
(1 row)
Running : migration-1-0010-20230612.sql migrate
(1 row)
Running : migration-1-0011-20230814.sql migrate
(1 row)
Running : migration-1-0012-20240211.sql migrate
(1 row)
Running : migration-1-0013-20240318.sql migrate
(1 row)
Running : migration-1-0014-20240411.sql migrate
(1 row)
Running : migration-1-0015-20240724.sql migrate
(1 row)
Running : migration-2-0001-20211003.sql migrate
(1 row)
Running : migration-2-0002-20211007.sql migrate
(1 row)
Running : migration-2-0003-20211013.sql migrate
(1 row)
Running : migration-2-0004-20211014.sql migrate
(1 row)
Running : migration-2-0005-20211018.sql migrate
(1 row)
Running : migration-2-0006-20220105.sql migrate
(1 row)
Running : migration-2-0007-20220118.sql migrate
(1 row)
Running : migration-2-0008-20220126.sql migrate
(1 row)
Running : migration-2-0009-20220207.sql migrate
(1 row)
Running : migration-2-0010-20220225.sql migrate
(1 row)
Running : migration-2-0011-20220318.sql migrate
(1 row)
Running : migration-2-0012-20220502.sql migrate
(1 row)
Running : migration-2-0013-20220505.sql migrate
(1 row)
Running : migration-2-0014-20220505.sql migrate
(1 row)
Running : migration-2-0015-20220505.sql migrate
(1 row)
Running : migration-2-0016-20220524.sql migrate
(1 row)
Running : migration-2-0017-20220526.sql migrate
(1 row)
Running : migration-2-0018-20220604.sql migrate
(1 row)
Running : migration-2-0019-20220615.sql migrate
(1 row)
Running : migration-2-0020-20220919.sql migrate
(1 row)
Running : migration-2-0021-20221019.sql migrate
(1 row)
Running : migration-2-0022-20221020.sql migrate
(1 row)
Running : migration-2-0023-20221019.sql migrate
(1 row)
Running : migration-2-0024-20221020.sql migrate
(1 row)
Running : migration-2-0025-20221020.sql migrate
(1 row)
Running : migration-2-0026-20231017.sql migrate
(1 row)
Running : migration-2-0027-20230713.sql migrate
(1 row)
Running : migration-2-0028-20240117.sql migrate
(1 row)
Running : migration-2-0029-20240117.sql migrate
(1 row)
Running : migration-2-0030-20240108.sql migrate
(1 row)
Running : migration-2-0031-20240117.sql migrate
(1 row)
Running : migration-2-0032-20230815.sql migrate
(1 row)
Running : migration-2-0033-20231009.sql migrate
(1 row)
Running : migration-2-0034-20240301.sql migrate
(1 row)
Running : migration-2-0035-20240308.sql migrate
(1 row)
Running : migration-2-0036-20240318.sql migrate
(1 row)
Running : migration-2-0037-20240403.sql migrate
(1 row)
Running : migration-2-0038-20240603.sql migrate
(1 row)
Running : migration-2-0039-20240703.sql migrate
(1 row)
Running : migration-2-0040-20240626.sql migrate
(1 row)
Running : migration-2-0041-20240711.sql migrate
(1 row)
Running : migration-2-0042-20240808.sql migrate
(1 row)
Running : migration-2-0043-20240828.sql migrate
(1 row)
Running : migration-3-0001-20190816.sql Running : migration-3-0002-20200521.sql psql:/home/cardano/cardano-db-sync/schema/migration-3-0002-20200521.sql:4: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. psql:/home/cardano/cardano-db-sync/schema/migration-3-0002-20200521.sql:4: error: connection to server was lost ExitFailure 2
Is it possible you're running out of memory? It seems clear from the logs that you're losing connection to the pg server
listen_addresses = '*' port = '5432' max_connections = '600' shared_buffers = '32GB' effective_cache_size = '96GB' maintenance_work_mem = '2GB' checkpoint_completion_target = '0.9' wal_buffers = '16MB' default_statistics_target = '100' random_page_cost = '1.0' effective_io_concurrency = '200' work_mem = '8GB' min_wal_size = '1GB' max_wal_size = '4GB' max_worker_processes = '128' max_parallel_workers_per_gather = '16' max_parallel_workers = '64' max_parallel_maintenance_workers = '4' log_min_duration_statement = '2000'
this is our postgres.conf file, I do see high memory consumption , but its not 100%. do you suggest any changes to above?
You might want to check out this tool: https://pgtune.leopard.in.ua/. This is what I used to generate my configuration. For my config, I chose "online transaction processing system"
here is the error, i was able to drill down till this.
2024-10-23 14:57:27.050 GMT [176] LOG: could not receive data from client: Connection reset by peer 2024-10-23 14:57:27.050 GMT [176] LOG: unexpected EOF on client connection with an open transaction
That error simply says a client connection was terminated.
You would need to look at your postgres DB crash reason (if needed , look at it outside of docker first), could be mariade of reasons [eg: Running out of infrastructure memory - for which can check oom msgs in system logs, ulimits, corrupted DB WAL markers if you haven't cleared existing DB before, etc].
IMO - github is not the right medium to help you troubleshoot system/infra issues. Maybe discord/forum/stackexchange would be better choices to search for existing or start new thread with better synopsis than what's presented here.
@NanuIjaz did you manage to take a look at your postgres instance as advised?