Spark Ingestion Job Fails with NullPointerException: "Cannot invoke getTableSpec() because spec is null"
What Happens:
- Error occurs on Spark Executor nodes during segment push phase
- The
SegmentGenerationJobSpecobject is null when executors try to push segments - Fails at
SegmentPushUtils.pushSegments()line 111
Where It Fails:
// SegmentPushUtils.java:111
String tableName = spec.getTableSpec().getTableName(); // ← spec is null! NPE here
Why It Happens:
- Driver loads job spec from file successfully ✅
- Executors try to reload spec from the same file path
- File not accessible on executor nodes (different machines)
- spec becomes null → NullPointerException
Root Cause:
Pinot's IngestionJobLauncher uses FileReader which:
- ❌ Only works with LOCAL files
- ❌ Cannot read HDFS URIs (
hdfs://...) - ❌ Cannot read S3 URIs (
s3://...) - ❌ Cannot access files on different machines
@xiangfu0
spark-submit --class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand --master yarn --deploy-mode cluster --conf "spark.driver.extraJavaOptions=-Dplugins.dir=/usr/vdp/3.4.1.0-6/apache-pinot-1.4.0-bin/plugins-external,/usr/vdp/3.4.1.0-6/apache-pinot-1.4.0-bin/plugins" --conf "spark.driver.extraClassPath=/usr/vdp/3.4.1.0-6/apache-pinot-1.4.0-bin/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark-3/pinot-batch-ingestion-spark-3-1.4.0-shaded.jar:/usr/vdp/3.4.1.0-6/apache-pinot-1.4.0-bin/lib/pinot-all-1.4.0-jar-with-dependencies.jar:/usr/vdp/3.4.1.0-6/apache-pinot-1.4.0-bin/plugins/pinot-file-system/pinot-hdfs/pinot-hdfs-1.4.0-shaded.jar" --conf "spark.executor.extraClassPath=/usr/vdp/3.4.1.0-6/apache-pinot-1.4.0-bin/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark-3/pinot-batch-ingestion-spark-3-1.4.0-shaded.jar:/usr/vdp/3.4.1.0-6/apache-pinot-1.4.0-bin/lib/pinot-all-1.4.0-jar-with-dependencies.jar:/usr/vdp/3.4.1.0-6/apache-pinot-1.4.0-bin/plugins/pinot-file-system/pinot-hdfs/pinot-hdfs-1.4.0-shaded.jar" --jars "/usr/vdp/3.4.1.0-6/apache-pinot-1.4.0-bin/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark-3/pinot-batch-ingestion-spark-3-1.4.0-shaded.jar,/usr/vdp/3.4.1.0-6/apache-pinot-1.4.0-bin/lib/pinot-all-1.4.0-jar-with-dependencies.jar,/usr/vdp/3.4.1.0-6/apache-pinot-1.4.0-bin/plugins/pinot-file-system/pinot-hdfs/pinot-hdfs-1.4.0-shaded.jar" --files /usr/vdp/3.4.1.0-6/apache-pinot-1.4.0-bin/examples/batch/airlineStats/sparkIngestionJobSpec.yaml local:///usr/vdp/3.4.1.0-6/apache-pinot-1.4.0-bin/lib/pinot-all-1.4.0-jar-with-dependencies.jar -jobSpecFile sparkIngestionJobSpec.yaml