pinot icon indicating copy to clipboard operation
pinot copied to clipboard

Add Spark Job Launcher tool

Open KKcorps opened this issue 3 years ago • 2 comments

The users currently need to create the whole spark-submit command to run a spark job for batch ingestion. With so many plugins available inside pinot leads a lot of classpath errors and you also need to take care of various arguments based on the environment in which you are running. This new command in pinot-admin aims to simply this for the users.

Example

Previously if you had to run

export PINOT_VERSION=0.11.0-SNAPSHOT export PINOT_DISTRIBUTION_DIR=/Users/kharekartik/Documents/Developer/pinot/build/ spark-submit --class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand --master yarn --deploy-mode client --jars ${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar,${PINOT_DISTRIBUTION_DIR}/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.11.0-SNAPSHOT-shaded.jar,${PINOT_DISTRIBUTION_DIR}/plugins/pinot-file-system/pinot-s3/pinot-s3-0.11.0-SNAPSHOT-shaded.jar,${PINOT_DISTRIBUTION_DIR}/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark-3.2/pinot-batch-ingestion-spark-3.2-0.11.0-SNAPSHOT-shaded.jar local://${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar -jobSpecFile parquet_ingestion_spec_spark3_students.yml

but now you can now use

export SPARK_HOME=/usr/lib/spark/ bin/pinot-admin.sh LaunchSparkDataIngestionJob -jobSpecFile parquet_ingestion_spec_spark3_students.yml -pluginsToLoad pinot-parquet:pinot-s3 -master yarn

Additional Options

  • You can also mention any additional spark configurations using the -sparkConf option -sparkConf spark.executor.cores=3:num-executors=4

  • Users can also specify jars directly from S3/GCS instead of local disk for environments like EMR -pinotBaseDir s3://your-bucket/apache-pinot-0.11.0-SNAPSHOT

  • You can choose whether to run spark 2.x or 3.x with the following option (default is SPARK_3) -sparkVersion SPARK_2

KKcorps avatar Aug 27 '22 18:08 KKcorps

Codecov Report

Merging #9288 (05ebd9d) into master (a5a83aa) will decrease coverage by 42.57%. The diff coverage is 18.85%.

:exclamation: Current head 05ebd9d differs from pull request most recent head 32fe741. Consider uploading reports for the commit 32fe741 to get more accurate results

@@              Coverage Diff              @@
##             master    #9288       +/-   ##
=============================================
- Coverage     68.66%   26.09%   -42.58%     
+ Complexity     4680       44     -4636     
=============================================
  Files          1859     1855        -4     
  Lines         99120    99278      +158     
  Branches      15075    15112       +37     
=============================================
- Hits          68062    25904    -42158     
- Misses        26174    70783    +44609     
+ Partials       4884     2591     -2293     
Flag Coverage Δ
integration1 26.09% <18.85%> (-0.13%) :arrow_down:
unittests1 ?
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...apache/pinot/broker/api/HttpRequesterIdentity.java 28.57% <0.00%> (-57.15%) :arrow_down:
...org/apache/pinot/broker/api/RequesterIdentity.java 50.00% <0.00%> (-50.00%) :arrow_down:
.../pinot/broker/api/resources/PinotBrokerLogger.java 0.00% <0.00%> (ø)
...mon/segment/generation/SegmentGenerationUtils.java 7.29% <0.00%> (-13.77%) :arrow_down:
...rg/apache/pinot/common/utils/LoggerFileServer.java 0.00% <0.00%> (ø)
...pache/pinot/common/utils/config/InstanceUtils.java 12.19% <0.00%> (-77.95%) :arrow_down:
...org/apache/pinot/common/utils/http/HttpClient.java 68.13% <ø> (ø)
...er/api/access/ZkBasicAuthAccessControlFactory.java 0.00% <ø> (ø)
...ache/pinot/controller/api/resources/Constants.java 21.05% <ø> (-21.06%) :arrow_down:
...er/api/resources/LLCSegmentCompletionHandlers.java 62.37% <ø> (+18.81%) :arrow_up:
... and 1365 more

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov-commenter avatar Aug 27 '22 19:08 codecov-commenter

It is failing in some cases like local environment but multi threaded. Working on fixing those post which we can merge.

KKcorps avatar Aug 29 '22 20:08 KKcorps