MIOpenGEMM icon indicating copy to clipboard operation
MIOpenGEMM copied to clipboard

How could I save the params and OpenCL kernel after geometry?

Open jackyh opened this issue 6 years ago • 10 comments

@newling I'm a newbie to MIOpenGEMM. Could you tell me how to save the params and OpenCL kernel after doing geometry? Sorry for raising this as an issue:) But I don't if we have other way to raise questions of usage? Great thanks in advance! -jack

jackyh avatar Jun 02 '18 15:06 jackyh

Hi Jack,

If you want to make MIOpenGEMM run fast for a particular Geometry (say M=10, N=5, K=4096, tA=tB=0), take a look at: examples/find.cpp.

If you want to actually see OpenCL kernels (which isn't necessary for most users of MIOpenGEMM), take a look at: examples/print.cpp

Please let me know if you have any follow-up questions.

newling avatar Jun 04 '18 17:06 newling

@newling Thanks! I will look over it and come back to you if any more issues!

jackyh avatar Jun 06 '18 02:06 jackyh

@newling

Hi, James: Thanks for your help again! But, I met a new issue here:):

I use API call like this:

MIOpenGEMM::gemm0( extSolution->order, extSolution->transA, extSolution->transB, extProblem->M, extProblem->N, extProblem->K, extSolution->alpha, d_a, 0, extSolution->lda, d_b, 0, extSolution->ldb, extSolution->beta, d_bias, 0, extSolution->ldc, &(l_Runtime->GetStream())[0], 0, NULL, NULL);

Here: M=10, N=4096,K=25088, TransA=false, TransB=true, alpha=beta=1

The issue is: it took about 564.23 milliseconds! While using clblas, it’s only 0.82 milliseconds. Do you know why? Any hints?

jackyh avatar Jun 07 '18 13:06 jackyh

These one of kernel performance tests are tricky to get right in my experience. One thing to note is that the first call to gemm with a Geometry (M=10, N=4096,K=25088,TransA=false, TransB=true, alpha=beta=1) is slow as the kernel is generated and compiled. Best practise is always to have at least one warm up run when benchmarking, not sure if you're doing this?

newling avatar Jun 08 '18 22:06 newling

@newling looks like you mean it needs to try some parameters then get the best one? Do we have any example about how to do this?

jackyh avatar Jun 09 '18 09:06 jackyh

@newling looks like there's some example in "MIOpenDriver gemm" for doing this, but I am not able to understand it's src code. So could you show me some quick example? Maybe there's some in find.cpp, but that's not very straightforward. Could you just write me a small piece of sample code?

Thanks in advance, -jack

jackyh avatar Jun 09 '18 10:06 jackyh

No gemm0 does not "try some parameters", but it still needs to compile the OpenCL kernel (I think clblas has kernels precompiled).

The function gemm0 is basically the same as clBLAS gemm. One place to look where it is used is in deepbench.cpp, line 287. It should be straightforward, just call gemm0 once before you start timing.

newling avatar Jun 09 '18 11:06 newling

@newling You are right, I tried. Then my question is: I think the opencl kernel will be stored somewhere, how long it will be there? i.e. if I don't remove any "cache", the compiled kernel will always be there? Where is it stored?

jackyh avatar Jun 09 '18 14:06 jackyh

@newling something more, I assume that I need just call gemm0 for once with any param set, as for the second time when I call gemm0 with different param, it will boost up, right? So, the first time of calling gemm0 is called as "warm-up"?

jackyh avatar Jun 09 '18 14:06 jackyh

@jackyh yep, correct. sorry for the slow reply.

newling avatar Jul 19 '18 06:07 newling