tispark
tispark copied to clipboard
[BUG] The charset compatibility of tidb and tispark is not clear
Describe the bug
When inserting a character that length is 3 such as ࠄ
using tispark , the binary
charset table will raise exception. But the latin1
and ascii
charset table is inserted successfully. Accroding to the definition of binary
,latin1
and ascii
,the length of them is 1 byte, and the result of inserting should all fail.
What did you do
- execute in tidb create table t1(a char(1)) character set binary; create table t2(a char(1)) character set latin1; create table t3(a char(1)) character set ascii;
- execute in tispark create table st1 using tidb options(table 't1') create table st2 using tidb options(table 't2') create table st3 using tidb options(table 't3') insert into st1 VALUES ('ࠄ') insert into st2 VALUES ('ࠄ') insert into st3 VALUES ('ࠄ')
What do you expect
The result of inserting should all fail. What happens instead
insert into st1 VALUES ('ࠄ')
fails, the detail log is as follows:
Caused by: com.pingcap.tikv.exception.ConvertNotSupportException: do not support converting from [B to com.pingcap.tikv.types.BytesType
at com.pingcap.tikv.types.BytesType.convertToBytes(BytesType.java:111)
at com.pingcap.tikv.types.BytesType.doConvertToTiDBType(BytesType.java:85)
at com.pingcap.tikv.types.DataType.convertToTiDBType(DataType.java:395)
at com.pingcap.tispark.write.TiBatchWriteTable$$anonfun$com$pingcap$tispark$write$TiBatchWriteTable$$sparkRow2TiKVRow$1.apply$mcVI$sp(TiBatchWriteTable.scala:560)
... 30 more
insert into st2 VALUES ('ࠄ')
and insert into st3 VALUES ('ࠄ')
are susseful.
Spark and TiSpark version info
Git Commit Hash: 9687b70202584883fe5b88d39fb62d64867ebde2 Git Branch: release-2.3 UTC Build Time: 2020-10-16 02:40:15 Supported Spark Version: 2.3 2.4 Current Spark Version: 2.4.3 Current Spark Major Version: 2.4 TimeZone: Asia/Shanghai
TIDB: release-4.0
/lifecycle frozen