tiflow icon indicating copy to clipboard operation
tiflow copied to clipboard

Kafka consumer panic: RowChangedEvent dispatched to wrong partition

Open fubinzh opened this issue 11 months ago • 2 comments

What did you do?

  1. create kafka avro changefeed
[root@upstream-ticdc-0 /]# cat /tmp/changefeed.toml
[integrity]
integrity-check-level = "correctness"

/cdc  cli  changefeed  create "--server=127.0.0.1:8301" "--sink-uri=kafka://downstream-kafka.cdc-testbed-tps-7260025-1-4:9092/kafka-avro?avro-enable-watermark=true&protocol=avro&replication-factor=1&enable-tidb-extension=true&avro-decimal-handling-mode=string&avro-bigint-unsigned-handling-mode=string" "--changefeed-id=kafka-avro-enable-extension" "--config=/tmp/changefeed.toml" "--schema-registry=http://schemaregistry.cdc-testbed-tps-7260025-1-4:8081"
  1. run kafka consumer
nohup /cdc_kafka_consumer --upstream-uri="kafka://downstream-kafka.cdc-testbed-tps-7260025-1-4:9092/kafka-avro?protocol=avro&max-message-bytes=1048576&enable-tidb-extension=true" --downstream-uri="mysql://root:@target.cdc-testbed-tps-7260025-1-4:3306?safe-mode=true&batch-dml-enable=false" --config=/tmp/changefeed.toml --schema-registry-uri=http://schemaregistry.cdc-testbed-tps-7260025-1-4:8081 --consumer-group-id=kafka-avro-avro-kafka://downstream-kafka.cdc-testbed-tps-7260025-1-4:9092 &
  1. run workload
sysbench --db-driver=mysql --mysql-host=`nslookup upstream-tidb.cdc-testbed-tps-7260025-1-4 | awk -F: '{print $2}' | awk 'NR==5' | sed s/[[:space:]]//g`  --mysql-port=4000 --mysql-user=root --mysql-db=workload --tables=1 --table-size=100000 --create_secondary=off --debug=true --threads=10 --mysql-ignore-errors=2013,1213,1105,1205,8022,8027,8028,9004,9007,1062 oltp_write_only prepar

What did you expect to see?

kafka consumer should not panic

What did you see instead?

Kafka consumer panic

[2024/03/05 05:10:37.126 +00:00] [INFO] [main.go:589] ["start consume claim"] [topic=kafka-avro] [partition=2] [initialOffset=-2] [highWaterMarkOffset=0]
[2024/03/05 05:11:43.624 +00:00] [PANIC] [main.go:673] ["RowChangedEvent dispatched to wrong partition"] [obtained=1] [expected=2] [partitionNum=3] [row="{\"StartTs\":0,\"CommitTs\":448165445767528461,\"RowID\":0,\"PhysicalTableID\":0,\"TableInfo\":{\"id\":0,\"name\":{\"O\":\"sbtest1\",\"L\":\"sbtest1\"},\"charset\":\"\",\"collate\":\"\",\"cols\":[{\"id\":100,\"name\":{\"O\":\"id\",\"L\":\"id\"},\"offset\":0,\"origin_default\":null,\"origin_default_bit\":null,\"default\":null,\"default_bit\":null,\"default_is_expr\":false,\"generated_expr_string\":\"\",\"generated_stored\":false,\"dependences\":null,\"type\":{\"Tp\":3,\"Flag\":3,\"Flen\":0,\"Decimal\":0,\"Charset\":\"utf8mb4\",\"Collate\":\"utf8mb4_bin\",\"Elems\":null,\"ElemsIsBinaryLit\":null,\"Array\":false},\"state\":5,\"comment\":\"\",\"hidden\":false,\"change_state_info\":null,\"version\":0},{\"id\":101,\"name\":{\"O\":\"k\",\"L\":\"k\"},\"offset\":1,\"origin_default\":null,\"origin_default_bit\":null,\"default\":null,\"default_bit\":null,\"default_is_expr\":false,\"generated_expr_string\":\"\",\"generated_stored\":false,\"dependences\":null,\"type\":{\"Tp\":3,\"Flag\":1,\"Flen\":0,\"Decimal\":0,\"Charset\":\"utf8mb4\",\"Collate\":\"utf8mb4_bin\",\"Elems\":null,\"ElemsIsBinaryLit\":null,\"Array\":false},\"state\":5,\"comment\":\"\",\"hidden\":false,\"change_state_info\":null,\"version\":0},{\"id\":102,\"name\":{\"O\":\"c\",\"L\":\"c\"},\"offset\":2,\"origin_default\":null,\"origin_default_bit\":null,\"default\":null,\"default_bit\":null,\"default_is_expr\":false,\"generated_expr_string\":\"\",\"generated_stored\":false,\"dependences\":null,\"type\":{\"Tp\":15,\"Flag\":1,\"Flen\":0,\"Decimal\":0,\"Charset\":\"utf8mb4\",\"Collate\":\"utf8mb4_bin\",\"Elems\":null,\"ElemsIsBinaryLit\":null,\"Array\":false},\"state\":5,\"comment\":\"\",\"hidden\":false,\"change_state_info\":null,\"version\":0},{\"id\":103,\"name\":{\"O\":\"pad\",\"L\":\"pad\"},\"offset\":3,\"origin_default\":null,\"origin_default_bit\":null,\"default\":null,\"default_bit\":null,\"default_is_expr\":false,\"generated_expr_string\":\"\",\"generated_stored\":false,\"dependences\":null,\"type\":{\"Tp\":15,\"Flag\":1,\"Flen\":0,\"Decimal\":0,\"Charset\":\"utf8mb4\",\"Collate\":\"utf8mb4_bin\",\"Elems\":null,\"ElemsIsBinaryLit\":null,\"Array\":false},\"state\":5,\"comment\":\"\",\"hidden\":false,\"change_state_info\":null,\"version\":0}],\"index_info\":[{\"id\":2,\"idx_name\":{\"O\":\"idx_0\",\"L\":\"idx_0\"},\"tbl_name\":{\"O\":\"\",\"L\":\"\"},\"idx_cols\":[{\"name\":{\"O\":\"id\",\"L\":\"id\"},\"offset\":0,\"length\":0}],\"state\":5,\"backfill_state\":0,\"comment\":\"\",\"index_type\":0,\"is_unique\":true,\"is_primary\":true,\"is_invisible\":false,\"is_global\":false,\"mv_index\":false}],\"constraint_info\":null,\"fk_info\":null,\"state\":0,\"pk_is_handle\":false,\"is_common_handle\":true,\"common_handle_version\":0,\"comment\":\"\",\"auto_inc_id\":0,\"auto_id_cache\":0,\"auto_rand_id\":0,\"max_col_id\":0,\"max_idx_id\":0,\"max_fk_id\":0,\"max_cst_id\":0,\"update_timestamp\":0,\"ShardRowIDBits\":0,\"max_shard_row_id_bits\":0,\"auto_random_bits\":0,\"auto_random_range_bits\":0,\"pre_split_regions\":0,\"partition\":null,\"compression\":\"\",\"view\":null,\"sequence\":null,\"Lock\":null,\"version\":0,\"tiflash_replica\":null,\"is_columnar\":false,\"temp_table_type\":0,\"cache_table_status\":0,\"policy_ref_info\":null,\"stats_options\":null,\"exchange_partition_info\":null,\"ttl_info\":null,\"SchemaID\":100,\"TableName\":{\"Schema\":\"workload\",\"Table\":\"sbtest1\",\"TableID\":0,\"IsPartition\":false},\"Version\":1000,\"RowColumnsOffset\":{\"100\":0,\"101\":1,\"102\":2,\"103\":3},\"ColumnsFlag\":{\"100\":10,\"101\":0,\"102\":0,\"103\":0},\"HandleIndexID\":-1,\"IndexColumnsOffset\":[[0]]},\"Columns\":[{\"column_id\":100,\"value\":2},{\"column_id\":101,\"value\":50248},{\"column_id\":102,\"value\":\"13241531885-45658403807-79170748828-69419634012-13605813761-77983377181-01582588137-21344716829-87370944992-02457486289\"},{\"column_id\":103,\"value\":\"28733802923-10548894641-11867531929-71265603657-36546888392\"}],\"PreColumns\":null,\"Checksum\":null,\"ApproximateDataSize\":0,\"SplitTxn\":false,\"ReplicatingTs\":0,\"HandleKey\":null}"] [stack="main.(*Consumer).ConsumeClaim\n\[tgithub.com/pingcap/tiflow/cmd/kafka-consumer/main.go:673\ngithub.com/IBM/sarama.(*consumerGroupSession).consume\n\tgithub.com/IBM/[email protected]/consumer_group.go:949\ngithub.com/IBM/sarama.newConsumerGroupSession.func2\n\tgithub.com/IBM/[email protected]/consumer_group.go:874](http://tgithub.com/pingcap/tiflow/cmd/kafka-consumer/main.go:673/ngithub.com/IBM/sarama.(*consumerGroupSession).consume/n/tgithub.com/IBM/[email protected]/consumer_group.go:949/ngithub.com/IBM/sarama.newConsumerGroupSession.func2/n/tgithub.com/IBM/[email protected]/consumer_group.go:874)"]
/ #
I have no name!@downstream-kafka-0:/$ kafka-topics.sh --describe --bootstrap-server=localhost:9092 --topic ${topic}
Topic: kafka-avro       TopicId: V9Jq5zhdROGkEOd5H7Iqkw PartitionCount: 3       ReplicationFactor: 1    Configs: segment.bytes=1073741824
        Topic: kafka-avro       Partition: 0    Leader: 2       Replicas: 2     Isr: 2
        Topic: kafka-avro       Partition: 1    Leader: 1       Replicas: 1     Isr: 1
        Topic: kafka-avro       Partition: 2    Leader: 0       Replicas: 0     Isr: 0

Versions of the cluster

kafka consumer version:

["Welcome to kafka consumer"] [release-version=v8.0.0-master-dirty] [git-hash=3744f9c330ce65742216a8495a2d5b3ef4c1e933] [git-branch=decoder-checksum-debug] [utc-build-time="2024-03-04 06:11:41"] 

cdc version:

[root@upstream-ticdc-0 /]# /cdc version
Release Version: v7.1.4
Git Commit Hash: 48bcbfb23aab4153a4fce6a5725c8bb850fcf8fc
Git Branch: heads/refs/tags/v7.1.4
UTC Build Time: 2024-03-01 11:21:08
Go Version: go version go1.20.12 linux/amd64
Failpoint Build: false

fubinzh avatar Mar 05 '24 06:03 fubinzh

/assign @3AceShowHand

fubinzh avatar Mar 05 '24 07:03 fubinzh

This issue affect v7.1.1, v7.1.2, v7.1.3, since the default table dispatcher is not switch to table dispatcher by the default

https://github.com/pingcap/tiflow/pull/9224/files

3AceShowHand avatar Mar 05 '24 08:03 3AceShowHand

closed by #10714

3AceShowHand avatar Mar 27 '24 05:03 3AceShowHand

In release-6.5, no need to modify the event router dispatch rule, since it's control by the enable-old-value.

If the test cannot pass, we should make changes on the kafka consumer.

3AceShowHand avatar Mar 27 '24 06:03 3AceShowHand