TornadoVM
TornadoVM copied to clipboard
Refactor the TimeProfiler
Describe the bug
Currently, this method (addValueToMetric) expects a taskName to be passed.
The addValueToMetric method is called for the TASK_COPY_IN_SIZE_BYES and TASK_COPY_OUT_SIZE_BYTES profile type, which is not correct. Objects passed as parameters to tasks belong to the whole task schedule context (i.e multiple tasks that belong to the same task schedule and use the same object parameter will result in a single COPY_IN/STREAM_IN).
Therefore, I think the TASK_COPY_IN/OUT_SIZE_BYTES should be profiled per task schedule, and not individual tasks.
How To Reproduce
To reproduce, run the test below with the -Dtornado.profiler=True flag
public static void add(int[] a, int[] b) {
for (@Parallel int i = 0; i < a.length; i++) {
b[i] = a[i] + a[i];
}
}
public static void mult(int[] a, int[] b) {
for (@Parallel int i = 0; i < b.length; i++) {
b[i] = b[i] + a[i] * 3;
}
}
public static void main(String[] args) {
int n = 32;
int[] a = new int[n];
int[] b = new int[n];
TaskSchedule ts = new TaskSchedule("s0")
.task("t0", Main::add, a, b)
.task("t1", Main::mult, a, b)
.streamOut(b);
ts.execute();
}
The output produced by the profiler is:
{
"s0": {
"TOTAL_DISPATCH_DATA_TRANSFERS_TIME": "51936",
"TOTAL_TASK_SCHEDULE_TIME": "298600160",
"TOTAL_DRIVER_COMPILE_TIME": "169628101",
"TOTAL_GRAAL_COMPILE_TIME": "51983174",
"TOTAL_KERNEL_TIME": "18176",
"TOTAL_DISPATCH_KERNEL_TIME": "15200",
"TOTAL_BYTE_CODE_GENERATION": "5949780",
"COPY_IN_TIME": "4512",
"COPY_OUT_TIME": "1888",
"s0.t0": {
"METHOD": "Main.add",
"DEVICE_ID": "0:0",
"DEVICE": "GeForce GTX 1650",
"TASK_COPY_OUT_SIZE_BYTES": "152",
"TASK_COPY_IN_SIZE_BYTES": "344",
"TASK_COMPILE_GRAAL_TIME": "35695419",
"TASK_KERNEL_TIME": "9984",
"TASK_COMPILE_DRIVER_TIME": "88543309"
},
"s0.t1": {
"METHOD": "Main.mult",
"DEVICE_ID": "0:0",
"DEVICE": "GeForce GTX 1650",
"TASK_COPY_IN_SIZE_BYTES": "40",
"TASK_COMPILE_GRAAL_TIME": "16287755",
"TASK_KERNEL_TIME": "8192",
"TASK_COMPILE_DRIVER_TIME": "81084792"
}
}
}
Even though objects a and b are used by both t0 and t1, the TASK_COPY_IN_SIZE_BYTES and TASK_COPY_OUT_SIZE_BYTES are only reported for t0.
Expected behavior The expected output for the test above would be:
{
"s0": {
"TOTAL_DISPATCH_DATA_TRANSFERS_TIME": "51936",
"TOTAL_TASK_SCHEDULE_TIME": "298600160",
"TOTAL_DRIVER_COMPILE_TIME": "169628101",
"TOTAL_GRAAL_COMPILE_TIME": "51983174",
"TOTAL_KERNEL_TIME": "18176",
"TOTAL_DISPATCH_KERNEL_TIME": "15200",
"TOTAL_BYTE_CODE_GENERATION": "5949780",
"COPY_IN_TIME": "4512",
"COPY_OUT_TIME": "1888",
"COPY_OUT_SIZE_BYTES": "XXXX",
"COPY_IN_SIZE_BYTES": "XXXX",
"s0.t0": {
"METHOD": "Main.add",
"DEVICE_ID": "0:0",
"DEVICE": "GeForce GTX 1650",
"TASK_COMPILE_GRAAL_TIME": "35695419",
"TASK_KERNEL_TIME": "9984",
"TASK_COMPILE_DRIVER_TIME": "88543309"
},
"s0.t1": {
"METHOD": "Main.mult",
"DEVICE_ID": "0:0",
"DEVICE": "GeForce GTX 1650",
"TASK_COMPILE_GRAAL_TIME": "16287755",
"TASK_KERNEL_TIME": "8192",
"TASK_COMPILE_DRIVER_TIME": "81084792"
}
}
}
Additional context Also, Javadoc for the datastructures of the profiler should be added.
Thank you @gigiblender for the report. This is by design. We want each task to have its own profiler. This is especially beneficial in a multi-task , multi-device environment. But I also agree that for some cases (e.g., using the task.sync(objects) ) the profiler should add the metrics at the task-schedule level, rather than the task-level.
I suggest refactoring this part to add both options.
- Add metrics for copies at the task-schedule level
- When possible, keep the copy metrics for each individual task.