risingwave icon indicating copy to clipboard operation
risingwave copied to clipboard

ch-benchmark data verification failed in chaos mesh test

Open xuefengze opened this issue 11 months ago • 1 comments

Chaos-mesh q4 test failed(ch-benchmark-pg-cdc). The experiment made the meta unavailable for 20 seconds. https://buildkite.com/risingwave-test/chaos-mesh/builds/624#018de6cc-6943-4c1e-8d63-680410204e81

================================================================================
chaos-mesh Result
================================================================================
Result               FAIL                
Pipeline Message     Nightly ch-benchmark-pg-cdc
Namespace            longcmkf-20240226-190341
TestBed              medium-arm-all-affinity
RW Version           nightly-20240226    
Test Start time      2024-02-26 19:07:59 
Test End time        2024-02-26 19:44:44 
Test Queries         q1,q2,q3,q4,q5,q6,q7,q8,q9,q10,q11,q12,q13,q14,q15,q17,q18,q19,q20,q21,q22
Grafana Metric       https://grafana.test.risingwave-cloud.xyz/d/EpkBw5W4k/risingwave-dev-dashboard?orgId=1&var-datasource=Prometheus:%20test-useast1-eks-a&var-namespace=longcmkf-20240226-190341&from=1708974479000&to=1708976684000
Grafana Logs         https://grafana.test.risingwave-cloud.xyz/d/liz0yRCZz1/log-search-dashboard?orgId=1&var-data_source=Logging:%20test-useast1-eks-a&var-namespace=longcmkf-20240226-190341&from=1708974479000&to=1708976684000
Memory Dumps         https://s3.console.aws.amazon.com/s3/buckets/test-useast1-mgmt-bucket-archiver?region=us-east-1&bucketType=general&prefix=k8s/longcmkf-20240226-190341/&showversions=false
Buildkite Job        https://buildkite.com/risingwave-test/chaos-mesh/builds/624
{
    "url": "postgres://postgres:[email protected]:5432/postgres",
    "database-name": "postgres",
    "database-checksum": -8835620199152605607,
    "table-checksums": [
        {
            "url": "postgres://postgres:[email protected]:5432/postgres",
            "table-name": "ch_benchmark_q4",
            "table-checksum": -8835620199152605607,
            "table-rows": 11
        }
    ]
}
{
    "url": "postgres://root:@127.0.0.1:4567/dev",
    "database-name": "dev",
    "database-checksum": -6045538533904742838,
    "table-checksums": [
        {
            "url": "postgres://root:@127.0.0.1:4567/dev",
            "table-name": "ch_benchmark_q4",
            "table-checksum": -6045538533904742838,
            "table-rows": 11
        }
    ]
}

xuefengze avatar Feb 27 '24 01:02 xuefengze

need to add a check for the integrity of base CDC table besides the check for the result of query q4, so we can know if the problem lies in the CDC or the query itself

link #15245 #15190

lmatz avatar Feb 27 '24 02:02 lmatz

need to add a check for the integrity of base CDC table besides the check for the result of query q4, so we can know if the problem lies in the CDC or the query itself

link #15245 #15190

Why we only check the checksum of q4? I remembered we also checked source tables before. Please add check for source tables and reproduce the issue again. cc @cyliu0

StrikeW avatar Apr 01 '24 03:04 StrikeW

Already added to the test. Refer to https://github.com/risingwavelabs/risingwave/issues/15312. Close this one since it's might be deprecated

cyliu0 avatar Apr 01 '24 03:04 cyliu0