postgres
postgres copied to clipboard
server process (PID 229) was terminated by signal 7: Bus error
postgres 14 deployed in kubernetes with 64Gi huge page on running TPC-C benchmarking with 800 warehouses(~80GB data size)
$ free -g
total used free shared buff/cache available
Mem: 503 134 256 0 113 366
Swap: 0 0 0
kubectl version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.4", GitCommit:"e6c093d87ea4cbb530a7b2ae91e54c0842d8308a", GitTreeState:"clean", BuildDate:"2022-02-16T12:38:05Z", GoVersion:"go1.17.7", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.5", GitCommit:"c285e781331a3785a7f436042c65c5641ce8a9e9", GitTreeState:"clean", BuildDate:"2022-03-16T15:52:18Z", GoVersion:"go1.17.8", Compiler:"gc", Platform:"linux/amd64"}
Detailed errors:
kubectl logs -n sfdev-test-static-hammerdb-tpcc-postgresql-disk-hugepage-on-128 postgresql-7df8c8b6c4-q9jz2
....
.....
2022-04-05 11:55:28.034 GMT [1] LOG: database system is ready to accept connections
**2022-04-05 12:06:05.719 GMT [1] LOG: server process (PID 229) was terminated by signal 7: Bus error
2022-04-05 12:06:05.719 GMT [1] DETAIL: Failed process was running: insert into order_line (ol_o_id, ol_d_id, ol_w_id, ol_number, ol_i_id, ol_supply_w_id, ol_quantity, ol_amount, ol_dist_info, ol_delivery_d) values ('2601','6','798','1','78438','798','5','0.0','WYsLGpch9MMPlrLtOvBAcfPY',null),('2601','6','798','2','78366','798','5','0.0','dkbbzWuT2fYGEA46ysZT4Soq',null),('2601','6','798','3','83817','798','5','0.0','akE67RK2vhlovKtmTT9h4OR6',null),('2601','6','798','4','36375','798','5','0.0','ayTlJV0KL3FbYB4w57kes8VY',null),('2601','6','798','5','40293','798','5','0.0','ZXl83xwLLJnEjeRTwHpayPfy',null),('2601','6','798','6','5079','798','5','0.0','S3zeBuSPobwm9mjZj6tnuShV',null),('2601','6','798','7','73194','798','5','0.0','teE2ELm3J5a81TK2McP10ojr',null),('2601','6','798','8','74758','798','5','0.0','yaykpdVaqKrKZdtSujY2WpNx',null),('2602','6','798','1','25330','798','5','0.0','atSwfYVJbCjI5gfj6S2tjxPE',null),('2602','6','798','2','74285','798','5','0.0','KqMS78XCkxexHrRxSXsQfMT6',null),('2602','6','798','3','10538','798','5','0.0','Foz8BVQA1qsu8lH80MZWXiC6',null),('2602',
2022-04-05 12:06:05.719 GMT [1] LOG: terminating any other active server processes**
2022-04-05 12:06:05.775 GMT [1] LOG: all server processes terminated; reinitializing
2022-04-05 12:06:06.024 GMT [249] LOG: database system was interrupted; last known up at 2022-04-05 12:05:35 GMT
2022-04-05 12:06:08.718 GMT [249] LOG: database system was not properly shut down; automatic recovery in progress
2022-04-05 12:06:08.718 GMT [249] LOG: redo starts at 9/F358258
2022-04-05 12:07:19.931 GMT [258] FATAL: the database system is in recovery mode
2022-04-05 12:07:39.762 GMT [261] FATAL: the database system is in recovery mode
2022-04-05 12:07:44.540 GMT [263] FATAL: the database system is in recovery mode
2022-04-05 12:07:56.819 GMT [265] FATAL: the database system is in recovery mode
2022-04-05 12:08:47.695 GMT [249] LOG: redo done at 14/74451FB0 system usage: CPU: user: 133.00 s, system: 25.88 s, elapsed: 158.97 s
2022-04-05 12:08:55.073 GMT [1] LOG: issuing SIGKILL to recalcitrant children
2022-04-05 12:08:55.085 GMT [1] LOG: all server processes terminated; reinitializing
2022-04-05 12:08:55.314 GMT [273] LOG: database system shutdown was interrupted; last known up at 2022-04-05 12:08:47 GMT
2022-04-05 12:08:56.072 GMT [273] LOG: database system was not properly shut down; automatic recovery in progress
2022-04-05 12:08:56.073 GMT [273] LOG: redo starts at 9/F358258
2022-04-05 12:08:58.177 GMT [275] FATAL: the database system is in recovery mode
2022-04-05 12:09:02.147 GMT [277] FATAL: the database system is in recovery mode
2022-04-05 12:09:04.930 GMT [279] FATAL: the database system is in recovery mode
2022-04-05 12:11:29.808 GMT [273] LOG: successfully skipped missing contrecord at 14/74451FF8, overwritten at 2022-04-05 12:08:47.699097+00
2022-04-05 12:11:29.808 GMT [273] CONTEXT: WAL redo at 14/74452018 for XLOG/OVERWRITE_CONTRECORD: lsn 14/74451FF8; time 2022-04-05 12:08:47.699097+00
2022-04-05 12:11:29.808 GMT [273] LOG: invalid record length at 14/74452048: wanted 24, got 0
2022-04-05 12:11:29.808 GMT [273] LOG: redo done at 14/74452018 system usage: CPU: user: 131.11 s, system: 22.60 s, elapsed: 153.73 s
2022-04-05 12:12:29.781 GMT [1] LOG: database system is ready to accept connections
https://github.com/kubernetes/kubernetes/issues/71233#issuecomment-447472125 It seems that huge pages don't work with k8s
Perhaps also https://github.com/docker-library/postgres/issues/451#issuecomment-447472044 and other comments there.
kubernetes/kubernetes#71233 (comment) It seems that huge pages don't work with k8s
NOT AGREE WITH YOU, ACTUALLY k8s works fine with mysql with hugepage on
Perhaps also #451 (comment) and other comments there.
The background is that i have an APPLE-TO-APPLE workload comparison with hugepage on/off, so i cannot set it to off.
Closing in favor of https://github.com/docker-library/docs/issues/2355 (we're not going to change the image / default configuration for this because we try to follow upstream's defaults faithfully, but better documentation would absolutely be appropriate :heart:).