vitess
vitess copied to clipboard
Bug Report: Inconsistent ENUM and SET handling in VStream/vstreamer
Overview of the Issue
In VReplication workflows and the VStream API there is a copy phase (initial snapshot copy) and a running/replicating phase. You can read more about the lifecycle here.
For the copy phase, a rowstreamer (a specific type of vstreamer) streams the rows resulting from a query used to get a consistent snapshot of the table at that logical point in time (a GTID set which matches the consistent snapshot of the table). For the running/replicating phase a standard vstreamer streams filtered binary log events from the source tablet to the target tablet(s).
Because the rowstreamer sends the results of a query, for ENUM and SET columns it sends the string value that it gets back from MySQL. Because the vstreamer sends filtered binary log events, for ENUM and SET columns it sends the integer based value as that's what is in the binary log events.
This has been (partially) addressed in the past on the consumer side. For VReplication workflows the consumers are vcopier for the rowstreamer and vplayer for the vstreamer and the vplayer started doing the integer to string mapping in https://github.com/vitessio/vitess/issues/15349 (this was done as it caused problems for Vitess OnlineDDL when the schema change was shuffling the order of elements in the ENUM around). In VStream consumers such as the Debezium Vitess connector and the PlanetScale Airbyte connector this mapping was done when processing the incoming VEvents (debezium connector PR, airbyte connector PR).
Rather than pushing this work on to each current and future consumer, we should unify the behavior in the VReplication vstreamer and its subtypes such as rowstreamer.
Reproduction Steps
Setup the test env:
git checkout main && make build
pushd examples/local
./101_initial_cluster.sh
mysql < ../common/insert_commerce_data.sql
vtctldclient ApplySchema --ddl-strategy=direct --sql "alter table customer add size enum ('small', 'medium', 'large') default 'medium'" commerce
Modify the vstream_client example to stream everything from the customer table in the unsharded commerce keyspace:
diff --git a/examples/local/vstream_client.go b/examples/local/vstream_client.go
index 98d2129f89..939178159f 100644
--- a/examples/local/vstream_client.go
+++ b/examples/local/vstream_client.go
@@ -38,7 +38,7 @@ import (
*/
func main() {
ctx := context.Background()
- streamCustomer := true
+ streamCustomer := false
var vgtid *binlogdatapb.VGtid
if streamCustomer {
vgtid = &binlogdatapb.VGtid{
In another Terminal, start a vtgate VStream of the customer table. It will start with the copy phase and then continue on in the running/replicating phase:
cd ${VTROOT}/examples/local
go run vstream_client.go
Generate some more data now that the vstream is past the copy phase and in the running/replicating phase:
mysql commerce
# ... perform inserts, updates, deletes against the customer table
Clean up:
./401_teardown.sh
popd
Sample results:
[type:BEGIN keyspace:"commerce" shard:"0" type:FIELD field_event:{table_name:"commerce.customer" fields:{name:"customer_id" type:INT64 table:"customer" org_table:"customer" database:"vt_commerce" org_name:"customer_id" column_length:20 charset:63 flags:49667 column_type:"bigint"} fields:{name:"email" type:VARBINARY table:"customer" org_table:"customer" database:"vt_commerce" org_name:"email" column_length:128 charset:63 flags:128 column_type:"varbinary(128)"} fields:{name:"size" type:ENUM table:"customer" org_table:"customer" database:"vt_commerce" org_name:"size" column_length:24 charset:255 flags:256 column_type:"enum('small','medium','large')"} keyspace:"commerce" shard:"0"} keyspace:"commerce" shard:"0"]
[type:VGTID vgtid:{shard_gtids:{keyspace:"commerce" shard:"0" gtid:"MySQL56/f9370c8c-9484-11ee-a06b-d03490376450:1-62"}} keyspace:"commerce" shard:"0"]
[type:ROW row_event:{table_name:"commerce.customer" row_changes:{after:{lengths:1 lengths:16 lengths:6 values:"[email protected]"}} keyspace:"commerce" shard:"0"} keyspace:"commerce" shard:"0" type:ROW row_event:{table_name:"commerce.customer" row_changes:{after:{lengths:1 lengths:14 lengths:6 values:"[email protected]"}} keyspace:"commerce" shard:"0"} keyspace:"commerce" shard:"0" type:ROW row_event:{table_name:"commerce.customer" row_changes:{after:{lengths:1 lengths:18 lengths:6 values:"[email protected]"}} keyspace:"commerce" shard:"0"} keyspace:"commerce" shard:"0" type:ROW row_event:{table_name:"commerce.customer" row_changes:{after:{lengths:1 lengths:14 lengths:6 values:"[email protected]"}} keyspace:"commerce" shard:"0"} keyspace:"commerce" shard:"0" type:ROW row_event:{table_name:"commerce.customer" row_changes:{after:{lengths:1 lengths:14 lengths:6 values:"[email protected]"}} keyspace:"commerce" shard:"0"} keyspace:"commerce" shard:"0" type:VGTID vgtid:{shard_gtids:{keyspace:"commerce" shard:"0" gtid:"MySQL56/f9370c8c-9484-11ee-a06b-d03490376450:1-62" table_p_ks:{table_name:"customer" lastpk:{fields:{name:"customer_id" type:INT64 charset:63 flags:49667} rows:{lengths:1 values:"5"}}}}} keyspace:"commerce" shard:"0" type:COMMIT keyspace:"commerce" shard:"0"]
[type:BEGIN keyspace:"commerce" shard:"0" type:VGTID vgtid:{shard_gtids:{keyspace:"commerce" shard:"0" gtid:"MySQL56/f9370c8c-9484-11ee-a06b-d03490376450:1-62"}} keyspace:"commerce" shard:"0" type:COMMIT keyspace:"commerce" shard:"0"]
[type:COPY_COMPLETED keyspace:"commerce" shard:"0" type:COPY_COMPLETED]
[type:BEGIN timestamp:1701901111 current_time:1701901111210764000 keyspace:"commerce" shard:"0" type:FIELD timestamp:1701901111 field_event:{table_name:"commerce.customer" fields:{name:"customer_id" type:INT64 table:"customer" org_table:"customer" database:"vt_commerce" org_name:"customer_id" column_length:20 charset:63 flags:49667 column_type:"bigint"} fields:{name:"email" type:VARBINARY table:"customer" org_table:"customer" database:"vt_commerce" org_name:"email" column_length:128 charset:63 flags:128 column_type:"varbinary(128)"} fields:{name:"size" type:ENUM table:"customer" org_table:"customer" database:"vt_commerce" org_name:"size" column_length:24 charset:255 flags:256 column_type:"enum('small','medium','large')"} keyspace:"commerce" shard:"0"} current_time:1701901111213501000 keyspace:"commerce" shard:"0" type:ROW timestamp:1701901111 row_event:{table_name:"commerce.customer" row_changes:{after:{lengths:1 lengths:21 lengths:1 values:"[email protected]"}} keyspace:"commerce" shard:"0" flags:1} current_time:1701901111213523000 keyspace:"commerce" shard:"0" type:VGTID vgtid:{shard_gtids:{keyspace:"commerce" shard:"0" gtid:"MySQL56/f9370c8c-9484-11ee-a06b-d03490376450:1-63"}} keyspace:"commerce" shard:"0" type:COMMIT timestamp:1701901111 current_time:1701901111213534000 keyspace:"commerce" shard:"0"]
Binary Version
❯ vtgate --version
vtgate version Version: 20.0.0-SNAPSHOT (Git revision da1301b1cec3ba7c12b148eda94d7fbaa063e6ef branch 'main') built on Thu Apr 18 10:44:19 EDT 2024 by [email protected] using go1.22.2 darwin/arm64
Operating System and Environment details
N/A
Log Fragments
No response
Note that this will not be a breaking change for the PlanetScale Airbyte connector as it will return the value as-is when it's not an integer: https://github.com/planetscale/airbyte-source/blob/426bcdb0b34c2f26276dbfe473f1f8b5627c2344/cmd/internal/types.go#L184-L236
The same is true for the other PlanetScale connectors as they have the same enum and set handling code as the Airbyte one:
- FiveTran: https://github.com/planetscale/fivetran-source
- SingerTap: https://github.com/planetscale/singer-tap
It is, however, a breaking change for the Debezium connector as it has a hard coded assumption that the value is an integer: https://github.com/debezium/debezium-connector-vitess/blob/a88fcf6bcb383e7b9f547213c5599040a11fc68f/src/main/java/io/debezium/connector/vitess/VitessValueConverter.java#L179-L257
This issue is being discussed with the Debezium community here: https://debezium.zulipchat.com/#narrow/stream/348255-community-vitess/topic/VStream.20ENUM.20.2F.20SET.20Changes