TornadoVM icon indicating copy to clipboard operation
TornadoVM copied to clipboard

[Proposal] Pass all method parameters through the call stack and improve code cache strategy

Open gigiblender opened this issue 4 years ago • 0 comments
trafficstars

Currently, we have two different strategies for caching compilation results, one for each backend (PTX, OpenCL).

For the PTX backend, we rely on the identity of the function parameters passed to the task. The issue with this is that a recompilation will be triggered every time a parameter is changed.

For the OpenCL backend, the key to the code cache is scheduleNo.taskNo-methodName. This can cause conflicts when multiple task schedules with the same name are created in different scopes.

For example with the code below:

    static class Data {
        int[] inTor;
        int[] outTor;
        int[] inSeq;
        int[] outSeq;

        public Data(int inTorSize, int outTorSize) {
            Random random = new Random();

            inTor = new int[inTorSize];
            outTor = new int[outTorSize];
            for (int i = 0; i < inTorSize; i++) {
                inTor[i] = random.nextInt();
            }
            for (int i = 0; i < outTorSize; i++) {
                outTor[i] = random.nextInt();
            }

            inSeq = inTor.clone();
            outSeq = outTor.clone();
        }
    }

    public static void testMethod(int[] in, int[] out) {
        for (@Parallel int i = 0; i < in.length; i++) {
            out[i] = in[i];
        }

    }

    public static void testMethod2(int[] in, int[] out) {
        for (@Parallel int i = 0; i < in.length; i++) {
            out[i] = in[i];
        }

    }

    public static void main(String[] args) {
        int N1 = 1024;

        // // FIRST SCOPE
        {
            Data data = new Data(N1, N1 * N1);
            TaskSchedule ts = new TaskSchedule("s0")
                    .task("t0", Main::testMethod, data.inTor, data.outTor)
                    .task("t1", Main::testMethod2, data.inTor, data.outTor)
                    .streamOut(data.inTor, data.outTor);

            ts.execute();
        }

        // SECOND SCOPE
        {
            N1 = N1 / 2;                                          // <---------- Use different input objects and size
            Data data = new Data(N1, N1 * N1);
            TaskSchedule ts = new TaskSchedule("s0")
                    .task("t0", Main::testMethod, data.inTor, data.outTor)
                    .task("t1", Main::testMethod2, data.inTor, data.outTor)
                    .streamOut(data.inTor, data.outTor);

            ts.execute();
        }
    }

The OpenCL backend will not recompile for the first task t0 in the second scope and therefore use the wrong inlined array length in.length value in the kernel. The reason the second task t1 is recompiled is a side effect from here The PTX backend will trigger 4 compilations in total.

I think the way to solve this is to pass all the task parameters (primitives and object references) through the call stack. We also might need to stop inlining array lengths in the @Parallel annotated loops (for(;i_3 < 1024;)).

gigiblender avatar Apr 09 '21 13:04 gigiblender