TornadoVM icon indicating copy to clipboard operation
TornadoVM copied to clipboard

Refactor the TimeProfiler

Open gigiblender opened this issue 3 years ago • 1 comments

Describe the bug Currently, this method (addValueToMetric) expects a taskName to be passed. The addValueToMetric method is called for the TASK_COPY_IN_SIZE_BYES and TASK_COPY_OUT_SIZE_BYTES profile type, which is not correct. Objects passed as parameters to tasks belong to the whole task schedule context (i.e multiple tasks that belong to the same task schedule and use the same object parameter will result in a single COPY_IN/STREAM_IN).

Therefore, I think the TASK_COPY_IN/OUT_SIZE_BYTES should be profiled per task schedule, and not individual tasks.

How To Reproduce To reproduce, run the test below with the -Dtornado.profiler=True flag

    public static void add(int[] a, int[] b) {
        for (@Parallel int i = 0; i < a.length; i++) {
            b[i] = a[i] + a[i];
        }
    }

    public static void mult(int[] a, int[] b) {
        for (@Parallel int i = 0; i < b.length; i++) {
            b[i] = b[i] + a[i] * 3;
        }
    }

    public static void main(String[] args) {
        int n = 32;
        int[] a = new int[n];
        int[] b = new int[n];

        TaskSchedule ts = new TaskSchedule("s0")
                .task("t0", Main::add, a, b)
                .task("t1", Main::mult, a, b)
                .streamOut(b);

        ts.execute();
    }

The output produced by the profiler is:

{
    "s0": {
        "TOTAL_DISPATCH_DATA_TRANSFERS_TIME": "51936",
        "TOTAL_TASK_SCHEDULE_TIME": "298600160",
        "TOTAL_DRIVER_COMPILE_TIME": "169628101",
        "TOTAL_GRAAL_COMPILE_TIME": "51983174",
        "TOTAL_KERNEL_TIME": "18176",
        "TOTAL_DISPATCH_KERNEL_TIME": "15200",
        "TOTAL_BYTE_CODE_GENERATION": "5949780",
        "COPY_IN_TIME": "4512",
        "COPY_OUT_TIME": "1888",
        "s0.t0": {
            "METHOD": "Main.add",
            "DEVICE_ID": "0:0",
            "DEVICE": "GeForce GTX 1650",
            "TASK_COPY_OUT_SIZE_BYTES": "152",
            "TASK_COPY_IN_SIZE_BYTES": "344",
            "TASK_COMPILE_GRAAL_TIME": "35695419",
            "TASK_KERNEL_TIME": "9984",
            "TASK_COMPILE_DRIVER_TIME": "88543309"
        }, 
        "s0.t1": {
            "METHOD": "Main.mult",
            "DEVICE_ID": "0:0",
            "DEVICE": "GeForce GTX 1650",
            "TASK_COPY_IN_SIZE_BYTES": "40",
            "TASK_COMPILE_GRAAL_TIME": "16287755",
            "TASK_KERNEL_TIME": "8192",
            "TASK_COMPILE_DRIVER_TIME": "81084792"
        }
    }
}

Even though objects a and b are used by both t0 and t1, the TASK_COPY_IN_SIZE_BYTES and TASK_COPY_OUT_SIZE_BYTES are only reported for t0.

Expected behavior The expected output for the test above would be:


{
    "s0": {
        "TOTAL_DISPATCH_DATA_TRANSFERS_TIME": "51936",
        "TOTAL_TASK_SCHEDULE_TIME": "298600160",
        "TOTAL_DRIVER_COMPILE_TIME": "169628101",
        "TOTAL_GRAAL_COMPILE_TIME": "51983174",
        "TOTAL_KERNEL_TIME": "18176",
        "TOTAL_DISPATCH_KERNEL_TIME": "15200",
        "TOTAL_BYTE_CODE_GENERATION": "5949780",
        "COPY_IN_TIME": "4512",
        "COPY_OUT_TIME": "1888",
        "COPY_OUT_SIZE_BYTES": "XXXX",
        "COPY_IN_SIZE_BYTES": "XXXX",
        "s0.t0": {
            "METHOD": "Main.add",
            "DEVICE_ID": "0:0",
            "DEVICE": "GeForce GTX 1650",
            "TASK_COMPILE_GRAAL_TIME": "35695419",
            "TASK_KERNEL_TIME": "9984",
            "TASK_COMPILE_DRIVER_TIME": "88543309"
        }, 
        "s0.t1": {
            "METHOD": "Main.mult",
            "DEVICE_ID": "0:0",
            "DEVICE": "GeForce GTX 1650",
            "TASK_COMPILE_GRAAL_TIME": "16287755",
            "TASK_KERNEL_TIME": "8192",
            "TASK_COMPILE_DRIVER_TIME": "81084792"
        }
    }
}

Additional context Also, Javadoc for the datastructures of the profiler should be added.

gigiblender avatar Apr 26 '21 10:04 gigiblender

Thank you @gigiblender for the report. This is by design. We want each task to have its own profiler. This is especially beneficial in a multi-task , multi-device environment. But I also agree that for some cases (e.g., using the task.sync(objects) ) the profiler should add the metrics at the task-schedule level, rather than the task-level.

I suggest refactoring this part to add both options.

  • Add metrics for copies at the task-schedule level
  • When possible, keep the copy metrics for each individual task.

jjfumero avatar Apr 29 '21 09:04 jjfumero