clusterdata icon indicating copy to clipboard operation
clusterdata copied to clipboard

DAGs have loops in 2018 traces

Open pouyahmdn opened this issue 3 years ago • 0 comments

Hi. Thank you for sharing the cluster data. They are a great help!!!

I've been looking through the 2018 traces, and intend to use them for simulation. However, I've come across DAGs that have loops in them.

For instance, look at job 'j_1053726' in 'batch_task.csv'. I'll highlight 2 tasks: M5_14_17_28_30_40_42_52_54_64_66_76_78_88_90_100_1,19,j_1053726,1,Terminated,349710,349750,50,0.3 R1_5,1,j_1053726,1,Terminated,349710,349754,50,0.2

Task 1 depends on Task 5 and Task 5 depends on Task 1. I'm guessing the task name was cutoff (the last 1 in the first task was probably 102). Is there a way to correct this beyond ignoring such DAGs?

Also, some tasks depend on 'Stgx' where x is a number. A few examples: J10_7_9_Stg4,69,j_4160894,1,Terminated,348729,349024,100,0.39 M7_Stg3,9,j_3424927,1,Terminated,409967,410068,100,0.39 R2_1_Stg8,39,j_642358,1,Running,675174,675174,, What does this mean? There are no tasks that start with 'Stgx'; There are only dependencies to them.

pouyahmdn avatar May 02 '22 19:05 pouyahmdn