[Bug]: tpcc stability test report 'Duplicate entry '3a15013a15033a160d66' for key '__mo_cpkey_col'' on distributed mode
Is there an existing issue for the same bug?
- [X] I have checked the existing issues.
Branch Name
1.1-dev
Commit ID
9d120d4
Other Environment Information
- Hardware parameters:
3*CN: 16C 64G
1*DN: 16C 64G
3*LOG: 4C 16G
- OS type:
- Others:
Actual Behavior
during stability test on distributed mode, there some error "Duplicate entry '3a15013a15033a160d66' for key '__mo_cpkey_col'" for tpcc test.
tpcc-longrunning-test/mo-tpcc/benchmarksql-info.log:2023-12-20 14:03:33 FATAL jTPCCTerminal:325 - [UNEXPECTED][TT_NEW_ORDER][EXECUTION] ErrorCode : 1062, ErrorMessage : Duplicate entry '3a15013a15013a161336' for key '__mo_cpkey_col' tpcc-longrunning-test/mo-tpcc/benchmarksql-info.log:2023-12-22 03:49:56 FATAL jTPCCTerminal:325 - [UNEXPECTED][TT_NEW_ORDER][EXECUTION] ErrorCode : 1062, ErrorMessage : Duplicate entry '(1,1,7575,1)' for key '__mo_cpkey_col' tpcc-longrunning-test/mo-tpcc/benchmarksql-info.log:2023-12-22 03:49:56 FATAL jTPCCTerminal:325 - [UNEXPECTED][TT_NEW_ORDER][EXECUTION] ErrorCode : 1062, ErrorMessage : Duplicate entry '(1,1,7575)' for key '__mo_cpkey_col' tpcc-longrunning-test/mo-tpcc/tpcc.log:2023-12-22 03:49:56 FATAL jTPCCTerminal:325 - [UNEXPECTED][TT_NEW_ORDER][EXECUTION] ErrorCode : 1062, ErrorMessage : Duplicate entry '(1,1,7575,1)' for key '__mo_cpkey_col' tpcc-longrunning-test/mo-tpcc/tpcc.log:2023-12-22 03:49:56 FATAL jTPCCTerminal:325 - [UNEXPECTED][TT_NEW_ORDER][EXECUTION] ErrorCode : 1062, ErrorMessage : Duplicate entry '(1,1,7575)' for key '__mo_cpkey_col' tpcc-longrunning-test/mo-tpcc/benchmarksql-error-1-10.log:2023-12-20 14:03:33 FATAL jTPCCTerminal:325 - [UNEXPECTED][TT_NEW_ORDER][EXECUTION] ErrorCode : 1062, ErrorMessage : Duplicate entry '3a15013a15013a161336' for key '__mo_cpkey_col' tpcc-longrunning-test/mo-tpcc/benchmarksql-error-1-10.log:2023-12-22 03:49:56 FATAL jTPCCTerminal:325 - [UNEXPECTED][TT_NEW_ORDER][EXECUTION] ErrorCode : 1062, ErrorMessage : Duplicate entry '(1,1,7575,1)' for key '__mo_cpkey_col' tpcc-longrunning-test/mo-tpcc/benchmarksql-error-1-10.log:2023-12-22 03:49:56 FATAL jTPCCTerminal:325 - [UNEXPECTED][TT_NEW_ORDER][EXECUTION] ErrorCode : 1062, ErrorMessage : Duplicate entry '(1,1,7575)' for key '__mo_cpkey_col'
mo-log:
Expected Behavior
No response
Steps to Reproduce
run stability test on distributed mode
Additional information
No response
@nnsgmsone please take a look at this issue
给我一下日志 @aressu1985
mo-log: http://10.222.6.1/explore?panes=%7B%22Aoi%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-stability-regression-20240112%5C%22%7D%20%7C%3D%20%603a15013a15033a160d66%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221704992471641%22,%22to%22:%221704998829377%22%7D%7D%7D&schemaVersion=1&orgId=1
对于dup的情况,接下来我会制造一个埋点来强制core dump来dump整个mo的快照。具体如何dump需要稍微设计一下。。
还在忙生产的bug,尚未处理
正在设计和实现中
正在设计和实现中
no process
no process
处理数据正确性问题中
no process
no process
no process
no process
no process
已经Fix 了一部分
分布式稳定性测试中,还会出现dup/ww, 正在定位
Blocked by snapshot read.
Not working on this
https://github.com/matrixorigin/matrixone/pull/15545
Not working on this
https://github.com/matrixorigin/matrixone/pull/15731
原因如下:
- txn1 在CN1 上insert 了 一条 PK, 并committed. 2. CN2 上的txn2 还未等到 这个pk 同步到partition state 中,就开始 运行 delete pk(delete statment 的snapshot ts 应该是小于txn1 的commit ts 的, 否则CN2 会等water mark 超过txn1 的commit ts ) , 这时pk 的rowid 查不到,delete 运行之后,affected rows =0, 相当于delete 没起效果; 然后运行 insert pk , 去重时,之前被txn1 提交的pk 同步过来了,然后在partiton state 中发现了相同的pk , 导致dup.
https://github.com/matrixorigin/matrixone/pull/15948
https://github.com/matrixorigin/matrixone/pull/15992 fixed dup/ww bug.
wait for @ouyuanning 's pr
wait for @ouyuanning 's pr
Wait for @ouyuanning ’PR
In testing
等 @ouyuanning ' PR 复现之后,继续查