seatunnel icon indicating copy to clipboard operation
seatunnel copied to clipboard

[Bug] [Module Name] doris insert error caused by data that include \n

Open gitfortian opened this issue 2 years ago • 9 comments

Search before asking

  • [X] I had searched in the issues and found no similar issues.

What happened

Reason: actual column number is less than schema column number.actual number: 1, column separator: [\t], line delimiter: [ ], schema number: 57;

and when i set doris.strip_outer_array="true" doris.format="json" in seatunnel conf it is not work either

SeaTunnel Version

2.1.3

SeaTunnel Config

env {
  # You can set spark configuration here
  # see available properties defined by spark: https://spark.apache.org/docs/latest/configuration.html#available-properties
  spark.app.name = "SeaTunnel"
  spark.executor.instances = 2
  spark.executor.cores = 8
  spark.executor.memory = "4g"
}

source {
 jdbc {
    driver = "com.mysql.jdbc.Driver"
    url = "jdbc:mysql://127.0.0.1:3306/test"
    user = "test"
    password = "xxx"
    table="test",
    result_table_name="test"
}
transform {

}

sink {
    Doris {
    fenodes="127.0.0.1:8031"
    database="test"
    table="test"
    user="test"
    password="xxx"
    batch_size=1000

    doris.column_separator="\t"
    或者下面两个配置
    doris.strip_outer_array=true
    doris.format="json"
 }
}

Running Command

./bin/start-seatunnel-spark.sh \
--master local[4] \
--deploy-mode client \
--config ./config/mysql2dorisspark.conf

Error Exception

Reason: actual column number is less than schema column number.actual number: 1, column separator: [\t], line delimiter: [
], schema number: 57; . src line https://miniprogtest.xxx.chinaunicom.cn/wechat-server/\x01\N\x01\N\x01\N\x01\N\x01\N\x01\N\x01\N\x01\N\x01\N\x01\N\x01false\x01\N\x01admin\x01import\x012022-06-08

Flink or Spark Version

spark 2.4

Java or Scala Version

java 1.8 scala 2.11.12

Screenshots

No response

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

gitfortian avatar Oct 12 '22 10:10 gitfortian

Could you give some records in the mysql? @gitfortian

CallMeKingsley97 avatar Oct 14 '22 09:10 CallMeKingsley97

u can insert into a record that which filed is string type and that content include \n would case this issue

gitfortian avatar Oct 24 '22 06:10 gitfortian

mysql? idont unstand

gitfortian avatar Oct 24 '22 06:10 gitfortian

Could you give some records in the mysql? @gitfortian

i viewed the spark connector code and found that problem cased by u split content by \n default this hard coding is not reasonable,can u support that insert data by setting properties like flink 'sink.properties.format' = 'json', 'sink.properties.read_json_by_line' = 'true'

gitfortian avatar Oct 24 '22 06:10 gitfortian

Could you give some records in the mysql? @gitfortian

i viewed the spark connector code and found that problem cased by u split content by \n default this hard coding is not reasonable,can u support that insert data by setting properties like flink 'sink.properties.format' = 'json', 'sink.properties.read_json_by_line' = 'true'

Could you give some records in the mysql? @gitfortian doris

gitfortian avatar Oct 24 '22 06:10 gitfortian

mkString("\n") this cause the issue happens

gitfortian avatar Oct 24 '22 06:10 gitfortian

mkString("\n") this cause the issue happens

Thanks, I'll try to fix it

CallMeKingsley97 avatar Oct 28 '22 10:10 CallMeKingsley97

mkString("\n") this cause the issue happens

but when I try to insert into doris with the records that contains "\n", the result is succeed. (Spark)

CallMeKingsley97 avatar Oct 31 '22 00:10 CallMeKingsley97

Use your record
https://miniprogtest.xxx.chinaunicom.cn/wechat-server/\x01\N\x01\N\x01\N\x01\N\x01\N\x01\N\x01\N\x01\N\x01\N\x01\N\x01false\x01\N\x01admin\x01import\x012022-06-08) , failed. And return the info: Reason: null value for not null column, column=name. src line: []; So maybe the error caused by \x01

CallMeKingsley97 avatar Oct 31 '22 01:10 CallMeKingsley97

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.

github-actions[bot] avatar Dec 01 '22 00:12 github-actions[bot]

This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.

github-actions[bot] avatar Dec 14 '22 00:12 github-actions[bot]