carbondata
carbondata copied to clipboard
[CARBONDATA-4260] Support CarbonData on Trino
Why is this PR needed? PrestoSQL has now changed its name to Trino. Because Facebook established the Presto Foundation at The Linux Foundation®,Led to prestosql Must be change the name. More information can see here
replace the old PrestoSQL implementation with a new Trino
CarbonData currently supports querying Hive/Spark data using PrestoSQL (316), but many new features and performance improvements are not available to users due to the new version of CarbonData
We will first provide a beta version of CarbonData on the summer-2021 branch. You are welcome to use the test, wait for the completion of the test and use cases, we will officially merge into the trunk branch when the functionality is stable
More information:https://issues.apache.org/jira/browse/CARBONDATA-4260
What changes were proposed in this PR? Java 11 removed ConstructorAccessor class,use unsafe class for reflection Rename the file Trino beginning with Presto Unify file hump case style and Divide component package names CarbonDataSplitManager add properties serialization.lib about StorageFormatSerDe TrinoFilterUtil Logical Changes to determine Rang Maven Pom Support Trino 358
Does this PR introduce any user interface change? No Is any new testcase added? No
retest this please
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5830/
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4085/
Build Failed with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/232/
retest this please
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4094/
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5838/
Build Failed with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/242/
retest this please
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4098/
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5842/
Build Failed with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/246/
Build Failed Reason Report:Jenkins evn don't support jdk11 , just have jdk8. Maybe this reason to build failes.
build mvn command : mvn clean install -pl :trino -am -Djacoco.skip=true -Dfindbugs.skip=true -DskipTests=true -Dspark.version=2.4.5 -Dhadoop.version=2.7.7 -Dhive.version=3.1.0 -Dscala.version=2.11.12
maven-version:3.8.1
jdk-version:openjdk-11.0.11
@czy006 thanks for your contribution, it would be helpful if you can raise one discussion in community on your approach to integrate Trino
Ex: https://www.mail-archive.com/[email protected]/msg02219.html https://www.mail-archive.com/[email protected]/msg01835.html i have requested @ajantha-bhat to review your design approach and PR
@brijoobopanna Thanks for your reply and recognition 😁 This is first time to contributing code, and i meet some problems during development and contribution. First, i want to know whether support directive builds when building PR, or JDK11. In many cases, mismatch of JDK or Maven version make build fail. I'll attached instructions and design documents later. I will add some instructions and design documents at a later date. I will develop this feature until replace the current PrestoSQL. Looking forward your next reply.
This graph is my local basic run base test:
@czy006 : Thanks for your interest in this. I have not reviewed PR line by line, But I do have some suggestions
-
I see that you are updating to trino 358, already 360 has released. Their release cycle is very fast. So, it will be hard to keep up. I want you to raise a discussion in dev and user mail list to see what version majority of people want to use it and do we need to support current prestosql 333 and prestodb also ?
-
Also we can't keep support for older version for long time, code becomes hard to maintain. We already have prestodb , prestosql, now trino folder.
-
Better to extract the common code (similar to we did for prestodb and prestosql) instead of duplicating all the classes.
-
Also your integration is high level integration! we need to make changes for carbon to use the new features that comes along with newer version of trino, like dynamic filtering, Rubix cache integration etc.
-
In the cluster you can test both insert flow as well as query flow to make sure functionalities are not broken, sometime unit testes are not enough (based on my experiences on presto integration)
-
Regrading your suggestion for adding JDK11 in PR builder, currently it is not possible as other modules of carbondata are not JDK11compatible yet. we get many compilation error when we compile all modules of carbondata. Example,
/repo/Carbon/format/target/gen-java/org/apache/carbondata/format/DataChunk.java:[32,24] package javax.annotation does not exist
presto 333 also needs JDK 11 for run time, so what I did is that only compile carbondata-presto jars in JDK11.
For that first compile whole project in JDK8, then export JAVA_HOME path to JDK11 and run the same maven command with -pl integration/presto
(also as I mentioned previously in https://github.com/apache/carbondata/issues/4184, use spark-2.3 profile)
@ajantha-bhat Thank you for your opinion.Your suggestion is very helpful to me, a good community should be discussed together. So I created this discussion on the mailing list earlier and welcome us to discuss this feature here. Some thoughts on me will be added here later . Or you can create a discussion for us to join