tispark [BUG] The charset compatibility of tidb and tispark is not clear

[BUG] The charset compatibility of tidb and tispark is not clear

Open lilinghai opened this issue 4 years ago • 1 comments

Describe the bug

When inserting a character that length is 3 such as ࠄ using tispark , the binary charset table will raise exception. But the latin1 and ascii charset table is inserted successfully. Accroding to the definition of binary,latin1 and ascii,the length of them is 1 byte, and the result of inserting should all fail. What did you do

execute in tidb create table t1(a char(1)) character set binary; create table t2(a char(1)) character set latin1; create table t3(a char(1)) character set ascii;
execute in tispark create table st1 using tidb options(table 't1') create table st2 using tidb options(table 't2') create table st3 using tidb options(table 't3') insert into st1 VALUES ('ࠄ') insert into st2 VALUES ('ࠄ') insert into st3 VALUES ('ࠄ')

What do you expect

The result of inserting should all fail. What happens instead

insert into st1 VALUES ('ࠄ') fails, the detail log is as follows:

Caused by: com.pingcap.tikv.exception.ConvertNotSupportException: do not support converting from [B to  com.pingcap.tikv.types.BytesType
	at com.pingcap.tikv.types.BytesType.convertToBytes(BytesType.java:111)
	at com.pingcap.tikv.types.BytesType.doConvertToTiDBType(BytesType.java:85)
	at com.pingcap.tikv.types.DataType.convertToTiDBType(DataType.java:395)
	at com.pingcap.tispark.write.TiBatchWriteTable$$anonfun$com$pingcap$tispark$write$TiBatchWriteTable$$sparkRow2TiKVRow$1.apply$mcVI$sp(TiBatchWriteTable.scala:560)
	... 30 more

insert into st2 VALUES ('ࠄ') and insert into st3 VALUES ('ࠄ') are susseful. Spark and TiSpark version info

Git Commit Hash: 9687b70202584883fe5b88d39fb62d64867ebde2 Git Branch: release-2.3 UTC Build Time: 2020-10-16 02:40:15 Supported Spark Version: 2.3 2.4 Current Spark Version: 2.4.3 Current Spark Major Version: 2.4 TimeZone: Asia/Shanghai

TIDB: release-4.0

Nov 02 '20 08:11 lilinghai

/lifecycle frozen

Apr 27 '22 09:04 shiyuhang0

tispark tispark copied to clipboard

[BUG] The charset compatibility of tidb and tispark is not clear

tispark
tispark copied to clipboard