carbondata icon indicating copy to clipboard operation
carbondata copied to clipboard

[CARBONDATA-4260] Support CarbonData on Trino

Open czy006 opened this issue 3 years ago • 18 comments

Why is this PR needed? PrestoSQL has now changed its name to Trino. Because Facebook established the Presto Foundation at The Linux Foundation®,Led to prestosql Must be change the name. More information can see here

replace the old PrestoSQL implementation with a new Trino

CarbonData currently supports querying Hive/Spark data using PrestoSQL (316), but many new features and performance improvements are not available to users due to the new version of CarbonData

We will first provide a beta version of CarbonData on the summer-2021 branch. You are welcome to use the test, wait for the completion of the test and use cases, we will officially merge into the trunk branch when the functionality is stable

More information:https://issues.apache.org/jira/browse/CARBONDATA-4260

What changes were proposed in this PR? Java 11 removed ConstructorAccessor class,use unsafe class for reflection Rename the file Trino beginning with Presto Unify file hump case style and Divide component package names CarbonDataSplitManager add properties serialization.lib about StorageFormatSerDe TrinoFilterUtil Logical Changes to determine Rang Maven Pom Support Trino 358

Does this PR introduce any user interface change? No Is any new testcase added? No

czy006 avatar Aug 08 '21 11:08 czy006

retest this please

czy006 avatar Aug 08 '21 11:08 czy006

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5830/

CarbonDataQA2 avatar Aug 08 '21 14:08 CarbonDataQA2

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4085/

CarbonDataQA2 avatar Aug 08 '21 14:08 CarbonDataQA2

Build Failed with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/232/

CarbonDataQA2 avatar Aug 08 '21 14:08 CarbonDataQA2

retest this please

czy006 avatar Aug 10 '21 11:08 czy006

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4094/

CarbonDataQA2 avatar Aug 10 '21 13:08 CarbonDataQA2

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5838/

CarbonDataQA2 avatar Aug 10 '21 13:08 CarbonDataQA2

Build Failed with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/242/

CarbonDataQA2 avatar Aug 10 '21 14:08 CarbonDataQA2

retest this please

czy006 avatar Aug 11 '21 06:08 czy006

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4098/

CarbonDataQA2 avatar Aug 11 '21 08:08 CarbonDataQA2

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5842/

CarbonDataQA2 avatar Aug 11 '21 09:08 CarbonDataQA2

Build Failed with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/246/

CarbonDataQA2 avatar Aug 11 '21 09:08 CarbonDataQA2

Build Failed Reason Report:Jenkins evn don't support jdk11 , just have jdk8. Maybe this reason to build failes. image

czy006 avatar Aug 11 '21 09:08 czy006

build mvn command : mvn clean install -pl :trino -am -Djacoco.skip=true -Dfindbugs.skip=true -DskipTests=true -Dspark.version=2.4.5 -Dhadoop.version=2.7.7 -Dhive.version=3.1.0 -Dscala.version=2.11.12 maven-version:3.8.1 jdk-version:openjdk-11.0.11 image image

czy006 avatar Aug 11 '21 12:08 czy006

@czy006 thanks for your contribution, it would be helpful if you can raise one discussion in community on your approach to integrate Trino

Ex: https://www.mail-archive.com/[email protected]/msg02219.html https://www.mail-archive.com/[email protected]/msg01835.html i have requested @ajantha-bhat to review your design approach and PR

brijoobopanna avatar Aug 11 '21 12:08 brijoobopanna

@brijoobopanna Thanks for your reply and recognition 😁 This is first time to contributing code, and i meet some problems during development and contribution. First, i want to know whether support directive builds when building PR, or JDK11. In many cases, mismatch of JDK or Maven version make build fail. I'll attached instructions and design documents later. I will add some instructions and design documents at a later date. I will develop this feature until replace the current PrestoSQL. Looking forward your next reply.

This graph is my local basic run base test:

czy006 avatar Aug 11 '21 15:08 czy006

@czy006 : Thanks for your interest in this. I have not reviewed PR line by line, But I do have some suggestions

  1. I see that you are updating to trino 358, already 360 has released. Their release cycle is very fast. So, it will be hard to keep up. I want you to raise a discussion in dev and user mail list to see what version majority of people want to use it and do we need to support current prestosql 333 and prestodb also ?

  2. Also we can't keep support for older version for long time, code becomes hard to maintain. We already have prestodb , prestosql, now trino folder.

  3. Better to extract the common code (similar to we did for prestodb and prestosql) instead of duplicating all the classes.

  4. Also your integration is high level integration! we need to make changes for carbon to use the new features that comes along with newer version of trino, like dynamic filtering, Rubix cache integration etc.

  5. In the cluster you can test both insert flow as well as query flow to make sure functionalities are not broken, sometime unit testes are not enough (based on my experiences on presto integration)

  6. Regrading your suggestion for adding JDK11 in PR builder, currently it is not possible as other modules of carbondata are not JDK11compatible yet. we get many compilation error when we compile all modules of carbondata. Example, /repo/Carbon/format/target/gen-java/org/apache/carbondata/format/DataChunk.java:[32,24] package javax.annotation does not exist

presto 333 also needs JDK 11 for run time, so what I did is that only compile carbondata-presto jars in JDK11. For that first compile whole project in JDK8, then export JAVA_HOME path to JDK11 and run the same maven command with -pl integration/presto (also as I mentioned previously in https://github.com/apache/carbondata/issues/4184, use spark-2.3 profile)

ajantha-bhat avatar Aug 13 '21 13:08 ajantha-bhat

@ajantha-bhat Thank you for your opinion.Your suggestion is very helpful to me, a good community should be discussed together. So I created this discussion on the mailing list earlier and welcome us to discuss this feature here. Some thoughts on me will be added here later . Or you can create a discussion for us to join

czy006 avatar Aug 13 '21 16:08 czy006