Giuseppe Rossini
Giuseppe Rossini
HI @tqchen , I will try to sporadically comment, since this is a project I prototyped (and enjoyed :) ) when I was in Arm. If I understand your comment...
Hi @pfultz2 , may I ask why this is not merged yet? It certainly sorted the issues I was facing with an internal proxy :)!
Did this get solved? I am hitting the exact same error :)
It's a bit strange we see different timings from migraphx. Did you set `HIP_FORCE_DEV_KERNARG=1` ? Also, are you using rocm 6.1 (rocm-6.1.0-445)?
Hi @shabalind , Thanks for the input! I don't think #93, #94, #95 are blockers, because those are only needed for us to have an easier life when analyzing performance...
Hi @shabalind , It really seems easy now, thanks! I will have a stab a this in the next few days and cc you in the review Thanks, Giuseppe
Funny, your tile sizes: ``` tile_sizes1=[128, 384, 512], tile_sizes2=[12, 32, 1], ``` Are not too dissimilar to mine: ``` tile_sizes1=[192, 8192, 512], tile_sizes2=[12, 8, 1], ``` I also found that...
Just to be on the same page, I am calling `k_c` the outer-tiling on `K` (in the example above `512`) and `k_r` the inner tiling on `K` (in the example...
@nicolasvasilache Ah I see what you mean (maybe). I could tile&peel K by 512 so that we have a full separation when I apply pipelineing. Doesn't seem like a bad...
Ok, I have implemented it in the way you suggested, and you were right, it was simple and performance are back on track. Unfortunately, I think that we might have...