cloud11665
cloud11665
Check out this thread on tests in general in the discord: https://discord.com/channels/1068976834382925865/1117201473596567682/1119787496809701521 ` ./tinygrad/lazy.py:PUSH_PERMUTES, PUSH_CONTIGUOUS = OPT>=3, OPT>=3`
This was mostly fixed in https://github.com/geohot/tinygrad/commit/2407690d821cfbb1747d5bf8088a4af3e5ac0769. As for the cast itself, we'd have to check if the instruction has the `.sat` modifier (`cvt.sat.s8.u8`)
@b7r6 I think this is a great learning experience, and I'd be happy to answer any questions about cuda / tinygrad on the discord!
``` ===================================================================== short test summary info ===================================================================== FAILED test/test_dtype.py::TestHalfDtype::test_half_matmul - pycuda._driver.LogicError: cuModuleLoadDataEx failed: a PTX JIT compilation failed - ptxas application ptx input, line 27; error : Unexpected instruction ty......
oh, so we don't want to format it (fix indentation on args and instructions, add newlines for params) ?
> I'd merge the colorizer (if it's clear from code it's just a colorizer), I think the formatting is fine from what I've seen. The risk is that it misprints...
 llama
 I've added parallelization of all global+local loops as there were cases where we were missing out on performance due to n_cores > loop_idx
I can take a look at this after getting access to a 40 series GPU
I currently have a 2080, and I will have access to a 4070 in ~2-3 weeks