pinot icon indicating copy to clipboard operation
pinot copied to clipboard

Spark Ingestion Job Fails with NullPointerException: "Cannot invoke getTableSpec() because spec is null"

Open Akanksha-kedia opened this issue 4 months ago • 1 comments

What Happens:

  • Error occurs on Spark Executor nodes during segment push phase
  • The SegmentGenerationJobSpec object is null when executors try to push segments
  • Fails at SegmentPushUtils.pushSegments() line 111

Where It Fails:

// SegmentPushUtils.java:111
String tableName = spec.getTableSpec().getTableName();  // ← spec is null! NPE here

Why It Happens:

  1. Driver loads job spec from file successfully ✅
  2. Executors try to reload spec from the same file path
  3. File not accessible on executor nodes (different machines)
  4. spec becomes null → NullPointerException

Root Cause:

Pinot's IngestionJobLauncher uses FileReader which:

  • ❌ Only works with LOCAL files
  • ❌ Cannot read HDFS URIs (hdfs://...)
  • ❌ Cannot read S3 URIs (s3://...)
  • ❌ Cannot access files on different machines

Akanksha-kedia avatar Nov 05 '25 09:11 Akanksha-kedia

@xiangfu0

spark-submit --class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand --master yarn --deploy-mode cluster --conf "spark.driver.extraJavaOptions=-Dplugins.dir=/usr/vdp/3.4.1.0-6/apache-pinot-1.4.0-bin/plugins-external,/usr/vdp/3.4.1.0-6/apache-pinot-1.4.0-bin/plugins" --conf "spark.driver.extraClassPath=/usr/vdp/3.4.1.0-6/apache-pinot-1.4.0-bin/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark-3/pinot-batch-ingestion-spark-3-1.4.0-shaded.jar:/usr/vdp/3.4.1.0-6/apache-pinot-1.4.0-bin/lib/pinot-all-1.4.0-jar-with-dependencies.jar:/usr/vdp/3.4.1.0-6/apache-pinot-1.4.0-bin/plugins/pinot-file-system/pinot-hdfs/pinot-hdfs-1.4.0-shaded.jar" --conf "spark.executor.extraClassPath=/usr/vdp/3.4.1.0-6/apache-pinot-1.4.0-bin/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark-3/pinot-batch-ingestion-spark-3-1.4.0-shaded.jar:/usr/vdp/3.4.1.0-6/apache-pinot-1.4.0-bin/lib/pinot-all-1.4.0-jar-with-dependencies.jar:/usr/vdp/3.4.1.0-6/apache-pinot-1.4.0-bin/plugins/pinot-file-system/pinot-hdfs/pinot-hdfs-1.4.0-shaded.jar" --jars "/usr/vdp/3.4.1.0-6/apache-pinot-1.4.0-bin/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark-3/pinot-batch-ingestion-spark-3-1.4.0-shaded.jar,/usr/vdp/3.4.1.0-6/apache-pinot-1.4.0-bin/lib/pinot-all-1.4.0-jar-with-dependencies.jar,/usr/vdp/3.4.1.0-6/apache-pinot-1.4.0-bin/plugins/pinot-file-system/pinot-hdfs/pinot-hdfs-1.4.0-shaded.jar" --files /usr/vdp/3.4.1.0-6/apache-pinot-1.4.0-bin/examples/batch/airlineStats/sparkIngestionJobSpec.yaml local:///usr/vdp/3.4.1.0-6/apache-pinot-1.4.0-bin/lib/pinot-all-1.4.0-jar-with-dependencies.jar -jobSpecFile sparkIngestionJobSpec.yaml

Akanksha-kedia avatar Nov 05 '25 09:11 Akanksha-kedia