carbondata
carbondata copied to clipboard
[CARBONDATA-3193] Cdh5.14.2 spark2.2.0 support
Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily:
-
[ No] Any interfaces changed?
-
[ No] Any backward compatibility impacted?
-
[ Yes] Document update required?
-
[ Yes] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change.
-
[ No Large Changes] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
This is just only the way we fix for CDH5.14.2, and the Spark2.2.0, the way how parquet treats the data.
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1923/
Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10176/
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2132/
@jackylk , @ravipesala Can somebody validate this against the CDH5.14.2 and could you please create a profile for build validation for CDH Libraries.. Please let me know where to add..if I need to trigger a different profile build.
Does carbon not support Cdh5.14.2 spark2.2.0 using -Pspark-2.2 ? Did CDH change the spark interface ,so that carbon can't run succesfully ?
@chandrasaripaka I can't find the spark maven dependency for CDH5.14.2, But I am able to build with CDH spark versions 2.2.0-cdh6.0.1
and 2.2.0.cloudera3
. Only the problem here I found is it does not have spark-hive-thriftserver
jar in cloudera repo, so classes related it like CarbonThriftServer
and CarbonSQLCLIDriver
cannot compile. Apart from I am able to compile carbon with 2.2.0-cdh6.0.1
and 2.2.0.cloudera3
versions. I am not sure why CDH does not include spark-hive-thriftserver
jar in there repo.
Please send the repository for CDH5.14.2, so that I can verify this version also.
Please check the repo here. https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_vd_cdh5_maven_repo_514x.html#maven_5142
I went through this link earlier, but I cannot find spark 2.2 version in this distribution. I can find only 1.6.0-cdh5.14.4
of spark here.
Please try this: https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/spark/spark-sql_2.11/2.2.0.cloudera2/
@chandrasaripaka , I got the issue, but creating many duplicate files may not be a good idea as it will be difficult to maintain, I will try to do with reflection.
And one more question is I don't find the package spark-hive-thriftserver_2.11
from cloudera, without this package we cannot run the carbon thrift server. Where to find this package? Or Is it ok if we don't run carbon thrift server from cloudera distribution?
@ravipesala , Yes you can let me know how you want to inherit..I think defining a interface for the wrapper would be a good fit.. In our local fork we did it using spark-2.2.0 normal thrift server, it is ok if we dont run thrift server from cloudera distribution, that works.
@chandrasaripaka
As I know, spark 2.2.0 is not a stable version, it is better to consider other more stable versions.
@chenliang613 , I am ok with it..as in the corporate we still need to livw with it.. we can update the jira and close it.
@chandrasaripaka Please check the PR https://github.com/apache/carbondata/pull/3026
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1953/
Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10206/
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2228/
We get error from a commit related to altertableschema.. which is not there in cloudera spark
https://github.com/apache/carbondata/commit/b0733ecbf380d7956dee57a9048dd7537620744e
This commit breaks the cloudera 2.2.0cloudera2
I will check on it
On Wed, 2 Jan 2019 at 10:37 PM, Chandrasekhar Saripaka < [email protected]> wrote:
b0733ec https://github.com/apache/carbondata/commit/b0733ecbf380d7956dee57a9048dd7537620744e
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/apache/carbondata/pull/3021#issuecomment-450922573, or mute the thread https://github.com/notifications/unsubscribe-auth/AHwdxlnPiNf3k7TSQcnRFjD_jqqCTHEiks5u_OdkgaJpZM4ZgM5V .
-- Thanks & Regards, Ravi
Thank you..let's use your pull request and close this...so that we can have a single.place to comment.
@chandrasaripaka please let us know 3026 if solved your issues?
Build Failed with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/40/
Build Failed with Spark 2.3.2, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/39/
Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1278/
Build Failed with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/41/
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3437/