MIOpenGEMM
MIOpenGEMM copied to clipboard
How could I save the params and OpenCL kernel after geometry?
@newling I'm a newbie to MIOpenGEMM. Could you tell me how to save the params and OpenCL kernel after doing geometry? Sorry for raising this as an issue:) But I don't if we have other way to raise questions of usage? Great thanks in advance! -jack
Hi Jack,
If you want to make MIOpenGEMM run fast for a particular Geometry (say M=10, N=5, K=4096, tA=tB=0), take a look at: examples/find.cpp
.
If you want to actually see OpenCL kernels (which isn't necessary for most users of MIOpenGEMM), take a look at: examples/print.cpp
Please let me know if you have any follow-up questions.
@newling Thanks! I will look over it and come back to you if any more issues!
@newling
Hi, James: Thanks for your help again! But, I met a new issue here:):
I use API call like this:
MIOpenGEMM::gemm0
Here: M=10, N=4096,K=25088, TransA=false, TransB=true, alpha=beta=1
The issue is: it took about 564.23 milliseconds! While using clblas, it’s only 0.82 milliseconds. Do you know why? Any hints?
These one of kernel performance tests are tricky to get right in my experience. One thing to note is that the first call to gemm with a Geometry (M=10, N=4096,K=25088,TransA=false, TransB=true, alpha=beta=1) is slow as the kernel is generated and compiled. Best practise is always to have at least one warm up run when benchmarking, not sure if you're doing this?
@newling looks like you mean it needs to try some parameters then get the best one? Do we have any example about how to do this?
@newling looks like there's some example in "MIOpenDriver gemm" for doing this, but I am not able to understand it's src code. So could you show me some quick example? Maybe there's some in find.cpp, but that's not very straightforward. Could you just write me a small piece of sample code?
Thanks in advance, -jack
No gemm0 does not "try some parameters", but it still needs to compile the OpenCL kernel (I think clblas has kernels precompiled).
The function gemm0 is basically the same as clBLAS gemm. One place to look where it is used is in deepbench.cpp, line 287. It should be straightforward, just call gemm0 once before you start timing.
@newling You are right, I tried. Then my question is: I think the opencl kernel will be stored somewhere, how long it will be there? i.e. if I don't remove any "cache", the compiled kernel will always be there? Where is it stored?
@newling something more, I assume that I need just call gemm0 for once with any param set, as for the second time when I call gemm0 with different param, it will boost up, right? So, the first time of calling gemm0 is called as "warm-up"?
@jackyh yep, correct. sorry for the slow reply.