TornadoVM
TornadoVM copied to clipboard
Refactor the TimeProfiler
Describe the bug
Currently, this method (addValueToMetric
) expects a taskName
to be passed.
The addValueToMetric
method is called for the TASK_COPY_IN_SIZE_BYES
and TASK_COPY_OUT_SIZE_BYTES
profile type, which is not correct. Objects passed as parameters to tasks belong to the whole task schedule context (i.e multiple tasks that belong to the same task schedule and use the same object parameter will result in a single COPY_IN/STREAM_IN).
Therefore, I think the TASK_COPY_IN/OUT_SIZE_BYTES
should be profiled per task schedule, and not individual tasks.
How To Reproduce
To reproduce, run the test below with the -Dtornado.profiler=True
flag
public static void add(int[] a, int[] b) {
for (@Parallel int i = 0; i < a.length; i++) {
b[i] = a[i] + a[i];
}
}
public static void mult(int[] a, int[] b) {
for (@Parallel int i = 0; i < b.length; i++) {
b[i] = b[i] + a[i] * 3;
}
}
public static void main(String[] args) {
int n = 32;
int[] a = new int[n];
int[] b = new int[n];
TaskSchedule ts = new TaskSchedule("s0")
.task("t0", Main::add, a, b)
.task("t1", Main::mult, a, b)
.streamOut(b);
ts.execute();
}
The output produced by the profiler is:
{
"s0": {
"TOTAL_DISPATCH_DATA_TRANSFERS_TIME": "51936",
"TOTAL_TASK_SCHEDULE_TIME": "298600160",
"TOTAL_DRIVER_COMPILE_TIME": "169628101",
"TOTAL_GRAAL_COMPILE_TIME": "51983174",
"TOTAL_KERNEL_TIME": "18176",
"TOTAL_DISPATCH_KERNEL_TIME": "15200",
"TOTAL_BYTE_CODE_GENERATION": "5949780",
"COPY_IN_TIME": "4512",
"COPY_OUT_TIME": "1888",
"s0.t0": {
"METHOD": "Main.add",
"DEVICE_ID": "0:0",
"DEVICE": "GeForce GTX 1650",
"TASK_COPY_OUT_SIZE_BYTES": "152",
"TASK_COPY_IN_SIZE_BYTES": "344",
"TASK_COMPILE_GRAAL_TIME": "35695419",
"TASK_KERNEL_TIME": "9984",
"TASK_COMPILE_DRIVER_TIME": "88543309"
},
"s0.t1": {
"METHOD": "Main.mult",
"DEVICE_ID": "0:0",
"DEVICE": "GeForce GTX 1650",
"TASK_COPY_IN_SIZE_BYTES": "40",
"TASK_COMPILE_GRAAL_TIME": "16287755",
"TASK_KERNEL_TIME": "8192",
"TASK_COMPILE_DRIVER_TIME": "81084792"
}
}
}
Even though objects a
and b
are used by both t0
and t1
, the TASK_COPY_IN_SIZE_BYTES
and TASK_COPY_OUT_SIZE_BYTES
are only reported for t0
.
Expected behavior The expected output for the test above would be:
{
"s0": {
"TOTAL_DISPATCH_DATA_TRANSFERS_TIME": "51936",
"TOTAL_TASK_SCHEDULE_TIME": "298600160",
"TOTAL_DRIVER_COMPILE_TIME": "169628101",
"TOTAL_GRAAL_COMPILE_TIME": "51983174",
"TOTAL_KERNEL_TIME": "18176",
"TOTAL_DISPATCH_KERNEL_TIME": "15200",
"TOTAL_BYTE_CODE_GENERATION": "5949780",
"COPY_IN_TIME": "4512",
"COPY_OUT_TIME": "1888",
"COPY_OUT_SIZE_BYTES": "XXXX",
"COPY_IN_SIZE_BYTES": "XXXX",
"s0.t0": {
"METHOD": "Main.add",
"DEVICE_ID": "0:0",
"DEVICE": "GeForce GTX 1650",
"TASK_COMPILE_GRAAL_TIME": "35695419",
"TASK_KERNEL_TIME": "9984",
"TASK_COMPILE_DRIVER_TIME": "88543309"
},
"s0.t1": {
"METHOD": "Main.mult",
"DEVICE_ID": "0:0",
"DEVICE": "GeForce GTX 1650",
"TASK_COMPILE_GRAAL_TIME": "16287755",
"TASK_KERNEL_TIME": "8192",
"TASK_COMPILE_DRIVER_TIME": "81084792"
}
}
}
Additional context Also, Javadoc for the datastructures of the profiler should be added.
Thank you @gigiblender for the report. This is by design. We want each task to have its own profiler. This is especially beneficial in a multi-task , multi-device environment. But I also agree that for some cases (e.g., using the task.sync(objects)
) the profiler should add the metrics at the task-schedule level, rather than the task-level.
I suggest refactoring this part to add both options.
- Add metrics for copies at the task-schedule level
- When possible, keep the copy metrics for each individual task.