Preston Jiang

Results 2 comments of Preston Jiang

I tried to use a larger size for the tensors, but the `CuArray` just takes too long to run. So it does look like the runtime gets worse with size....

Thanks for testing! I did rewrite the code with reshape and batched_mul, which is faster on the GPUs. I guess I will leave this open for now :)