Spark jobs success/fail status in the UI/metrics
Hi :-)
I know nothing about spark jobs by themselves, but I run CDM in standalone mode, and spark UI is available on port 4040.
I can see the job (numParts) running, and a summary of all "SUCCESS" jobs in the WebUI, even when the job are "FAIL" for CDM.
Is it expected ?
I enabled the spark prometheus metrics (in spark-3.5.5-bin-hadoop3-scala2.13/conf/metrics.properties) to be able to follow the success/failed jobs, but as they are based on the same information, everything is "success" :)
I can still follow the trackRun table for FAIL, but I wonder if there is another way.
Thank you,
Thanks for your question @Skunnyk! What were you looking to find out using the Spark UI here during the migration?
Hi @Skunnyk,
CDM tracks & handles failures internally within each parallelized Spark task. Hence the Spark UI will report everything as SUCCESS because from Spark's perspective, the tasks complete without errors.
Spark only tracks the overall status of tasks (e.g., successful, failed, or running), whereas CDM tasks have its own detailed life-cycle (NOT_STARTED, STARTED, PASS, FAIL, DIFF, DIFF_CORRECTED, ENDED). We do not allow failures to quit the tasks abruptly as there are other reporting/cleanup actions that happens even after a failure.
We may be able to tweak the app to report failures to the Spark UI (although we would prefer not to), but using Spark UI for reporting was never the plan. Our recommended way to track/monitor the jobs is via the trackrun feature.
Hi, thank you for your answers.
@msmygit: I wanted to use the generated prometheus metrics from the spark process to be able to follow/graph the success/failed tasks because the migration processes will run for a couple of days :-)
@pravinbhat Ok that was what I thought, thank you.
CDM logs/output are a bit hard to follow (this can be improved with some log4j configuration I guess), and with the trackRun feature, it can be hard to see where we are at when multi run/previousId are done for a big table with failed tasks.