pinot
pinot copied to clipboard
Add Spark Job Launcher tool
The users currently need to create the whole spark-submit command to run a spark job for batch ingestion. With so many plugins available inside pinot leads a lot of classpath errors and you also need to take care of various arguments based on the environment in which you are running. This new command in pinot-admin aims to simply this for the users.
Example
Previously if you had to run
export PINOT_VERSION=0.11.0-SNAPSHOT export PINOT_DISTRIBUTION_DIR=/Users/kharekartik/Documents/Developer/pinot/build/ spark-submit --class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand --master yarn --deploy-mode client --jars ${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar,${PINOT_DISTRIBUTION_DIR}/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.11.0-SNAPSHOT-shaded.jar,${PINOT_DISTRIBUTION_DIR}/plugins/pinot-file-system/pinot-s3/pinot-s3-0.11.0-SNAPSHOT-shaded.jar,${PINOT_DISTRIBUTION_DIR}/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark-3.2/pinot-batch-ingestion-spark-3.2-0.11.0-SNAPSHOT-shaded.jar local://${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar -jobSpecFile parquet_ingestion_spec_spark3_students.yml
but now you can now use
export SPARK_HOME=/usr/lib/spark/ bin/pinot-admin.sh LaunchSparkDataIngestionJob -jobSpecFile parquet_ingestion_spec_spark3_students.yml -pluginsToLoad pinot-parquet:pinot-s3 -master yarn
Additional Options
-
You can also mention any additional spark configurations using the
-sparkConfoption-sparkConf spark.executor.cores=3:num-executors=4 -
Users can also specify jars directly from S3/GCS instead of local disk for environments like EMR
-pinotBaseDir s3://your-bucket/apache-pinot-0.11.0-SNAPSHOT -
You can choose whether to run spark 2.x or 3.x with the following option (default is SPARK_3)
-sparkVersion SPARK_2
Codecov Report
Merging #9288 (05ebd9d) into master (a5a83aa) will decrease coverage by
42.57%. The diff coverage is18.85%.
:exclamation: Current head 05ebd9d differs from pull request most recent head 32fe741. Consider uploading reports for the commit 32fe741 to get more accurate results
@@ Coverage Diff @@
## master #9288 +/- ##
=============================================
- Coverage 68.66% 26.09% -42.58%
+ Complexity 4680 44 -4636
=============================================
Files 1859 1855 -4
Lines 99120 99278 +158
Branches 15075 15112 +37
=============================================
- Hits 68062 25904 -42158
- Misses 26174 70783 +44609
+ Partials 4884 2591 -2293
| Flag | Coverage Δ | |
|---|---|---|
| integration1 | 26.09% <18.85%> (-0.13%) |
:arrow_down: |
| unittests1 | ? |
|
| unittests2 | ? |
Flags with carried forward coverage won't be shown. Click here to find out more.
| Impacted Files | Coverage Δ | |
|---|---|---|
| ...apache/pinot/broker/api/HttpRequesterIdentity.java | 28.57% <0.00%> (-57.15%) |
:arrow_down: |
| ...org/apache/pinot/broker/api/RequesterIdentity.java | 50.00% <0.00%> (-50.00%) |
:arrow_down: |
| .../pinot/broker/api/resources/PinotBrokerLogger.java | 0.00% <0.00%> (ø) |
|
| ...mon/segment/generation/SegmentGenerationUtils.java | 7.29% <0.00%> (-13.77%) |
:arrow_down: |
| ...rg/apache/pinot/common/utils/LoggerFileServer.java | 0.00% <0.00%> (ø) |
|
| ...pache/pinot/common/utils/config/InstanceUtils.java | 12.19% <0.00%> (-77.95%) |
:arrow_down: |
| ...org/apache/pinot/common/utils/http/HttpClient.java | 68.13% <ø> (ø) |
|
| ...er/api/access/ZkBasicAuthAccessControlFactory.java | 0.00% <ø> (ø) |
|
| ...ache/pinot/controller/api/resources/Constants.java | 21.05% <ø> (-21.06%) |
:arrow_down: |
| ...er/api/resources/LLCSegmentCompletionHandlers.java | 62.37% <ø> (+18.81%) |
:arrow_up: |
| ... and 1365 more |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
It is failing in some cases like local environment but multi threaded. Working on fixing those post which we can merge.