Preston Jiang
Results
2
comments of
Preston Jiang
I tried to use a larger size for the tensors, but the `CuArray` just takes too long to run. So it does look like the runtime gets worse with size....
Thanks for testing! I did rewrite the code with reshape and batched_mul, which is faster on the GPUs. I guess I will leave this open for now :)