doris-spark-connector icon indicating copy to clipboard operation
doris-spark-connector copied to clipboard

Spark 3.3.0 support

Open chncaesar opened this issue 2 years ago • 3 comments

Proposed changes

  1. Support Spark 3.3.0 Removed log4j 1.x, and uses Spark's Logging trait, which uses log4j 2.x in Sprak 3.3.0. For older Spark versions , this change does not break the compability. Code changes are in ScalaValueReader.scala

  2. Close BufferedReader in DorisStreamLoad When reading Doris BE rest api's response, BufferedReader should be closed in DorisStreamLoad , function: loadBatch

  3. Change spark.minor.version to spark.major.version In pom.xml, the property spark.minor.version is actually spark major version.

  4. source jar to include scala code changes in pom.xml scala-maven-plugin

Issue Number: close #xxx

Problem Summary:

This pr upgrades the code to support Spark 3.3.0, as well as other minor changes.

Checklist(Required)

  1. Does it affect the original behavior: (Yes/No/I Don't know) No

  2. Has unit tests been added: (Yes/No/No Need) No unit test is added, but tested manually. in spark-sql CLI.

  3. Has document been added or modified: (Yes/No/No Need)

  4. Does it need to update dependencies: (Yes/No)

  5. Are there any changes that cannot be rolled back: (Yes/No)

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

Test results:

Versions:

  • spark-3.3.0-bin-hadoop3
  • JDK 1.8
  1. truncate Doris table from CLI image

  2. Create spark view and insert data into Doris table Start spark-sql CLI in local mode and execute:

CREATE TEMPORARY VIEW spark_doris
USING doris
OPTIONS(
  "table.identifier"="zjc_1.table_hash",
  "fenodes"="localhost:8030",
  "user"="zjc",
  "password"="******"
);
insert into spark_doris select 5,15.0;
  1. Check data in Doris image

  2. Select data in spark-sql select * from spark_doris; image

How to build spark-doris-connector for Spark 3.3.0

Run the command: sh build.sh --spark 3.3.0 --scala 2.12

chncaesar avatar Dec 12 '22 13:12 chncaesar

Please modify the spark.minor.version name in the build.sh script

hf200012 avatar Dec 13 '22 03:12 hf200012

Hello, thank you for your contribution, can you resolve the conflict?

JNSimba avatar Feb 22 '23 14:02 JNSimba

Hi, any further plans or progress on this PR?

And it seems not all the features listed are about introducing support to Spark 3.3 and they are good to be separated into several smaller PRs.

bowenliang123 avatar Mar 27 '23 01:03 bowenliang123