spark-rapids icon indicating copy to clipboard operation
spark-rapids copied to clipboard

[FEA] Qualification Tool: For Databricks eventlog capture more information in output csv file

Open viadea opened this issue 3 years ago • 0 comments

Currently when using Qualification tool to process lots of Eventlogs on Databricks, the result normally would be:

==================================================================================================================================================================================
|        App Name|                 App ID|App Duration|SQL DF Duration|GPU Opportunity|Estimated GPU Duration|Estimated GPU Speedup|Estimated GPU Time Saved|      Recommendation|
==================================================================================================================================================================================
|Databricks Shell|app-20220723004538-0000|      361338|         312741|         251138|             196206.05|                 1.84|               165131.94|         Recommended|
|Databricks Shell|app-20220723003957-0000|       82457|          34752|          27630|              64289.11|                 1.28|                18167.88|     Not Recommended|
|Databricks Shell|app-20220721173352-0000|       47549|           6439|           3593|              45326.82|                 1.04|                 2222.17|     Not Recommended|
==================================================================================================================================================================================

Most of times the "App Name=Databricks Shell" and the "App ID" is not useful for us to identify which job it is. Eg, we do not know where is the event log for app-20220723004538-0000. Because the Databricks eventlog is named as "eventlog" , "eventlog.2",etc without appID in the file name.

If we can capture below information from eventlog and save the mapping relationship between appID and them, that would be helpful for us to find out what job is for this specific appID:

spark.databricks.clusterUsageTags.clusterLogDestination	dbfs:/cluster-logs/xxx/testjoincpu_recom/
spark.databricks.clusterUsageTags.clusterId 0723-004502-qy8s8g1d
spark.databricks.clusterUsageTags.clusterAllTags [{"key":"Vendor","value":"Databricks"},{"key":"Creator","value":"[email protected]"},{"key":"ClusterName","value":"job-708382488055798-run-52523-testbyxxx_job_cluster"},{"key":"ClusterId","value":"0723-004502-qy8s8g1d"},{"key":"JobId","value":"708382488055798"},{"key":"RunName","value":"test_join_cpu_recommended"},{"key":"Name","value":"8721196619973675-2699ba82-ab92-479e-8ab1-102a1d7b07a6-worker"}]

viadea avatar Jul 23 '22 01:07 viadea