cloud11665

Results 59 comments of cloud11665

installing pycuda and compilation now take up most of the time :)

Installing deps takes so long because of pycuda which is not installed in any other test. As for moving that into core tinygrad, should it live in ops_cuda or some...

!!! ![image](https://github.com/geohot/tinygrad/assets/59028866/b3c40dfa-5aa2-4534-aebb-e742889a60ad) sub 8 minutes !

Does the CI config file count, as that alone is about 50 ?

simply casting true and false branch of a ternary statement got rid of the issue. also passes `test/test_dtype.py::TestHalfDtype::test_int8_matmul_upcast_half`. Only test left is the stupid int8 -> uint8 saturation test/test_dtype.py .....................................F........

Oops, was also casting to float4, and given it's implementation quirks it broke the opencl tests.

I'm getting 1100mspt on a 3090 with `JIT=1 OPT=4 OPTLOCAL=2` with cuda and 180mspt with opencl. It's not looking too good for cuda atm, I'll have to investigate it further,...

should the half4 stuff be inlined into CUDAProgram then ? Imo it's not worth it as that'd make the output much more noisy. On the other hand, we could solve...

oh, but wasn't the ignoring of casts a known issue ?