hive icon indicating copy to clipboard operation
hive copied to clipboard

HIVE-28093: Re-execute DAG in case of NoCurrentDAGException

Open abstractdog opened this issue 11 months ago • 1 comments

What changes were proposed in this pull request?

Related to TEZ-4543, this is to rerun DAG if the client faces a DAG_FAILED due to NoCurrentDAGException in the AM.

Why are the changes needed?

TEZ-4543 takes care of returning quite fast if a restarted AM doesn't run the queried DAG.

Does this PR introduce any user-facing change?

No.

Is the change a dependency upgrade?

No.

How was this patch tested?

Tested on cluster, and unit tests for AM + Hive already added.

This is logged when dag_1708961199044_0002_1 failed earlier, and as I kept injected OOM into an AM (making it crash in a k8s environment), dag_1708961199044_0003_1 is failed again.

hiveserver2 <14>1 2024-02-26T16:00:37.730Z hiveserver2-0 hiveserver2 1 dedef3f4-339f-4ba3-a6ae-300751d3561d [mdc@18060 class="reexec.ReExecuteLostAMQueryPlugin" dagId="dag_1708961199044_0003_1" level="INFO" operationLogLevel="EXECUTION" queryId="hive_20240226155836_6b1e9eb9-efd7-42fd-8872-f4189c5dda3a" sessionId="9e4cb344-ad7f-4344-9b24-aedaf0e73bf4" thread="HiveServer2-Background-Pool: Thread-129"] Got exception message: No running DAG at present retryPossible: true, dags seen so far: [dag_1708961199044_0002_1, dag_1708961199044_0003_1]

abstractdog avatar Feb 27 '24 09:02 abstractdog

Minor Stuff, else looks good

thanks a lot, addressed your comments

abstractdog avatar Feb 28 '24 13:02 abstractdog

Quality Gate Passed Quality Gate passed

Issues
2 New issues
0 Accepted issues

Measures
1 Security Hotspot
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

sonarqubecloud[bot] avatar Feb 28 '24 18:02 sonarqubecloud[bot]