chenyu

Results 33 comments of chenyu

@wozeparrot this is done

closing this as complete, it was before all dtype stuff. we support float16, int8, nf4 for llama now (no int8 tc though). and it's straightforward to add more

closing as stale. The lazy has changed a lot. Limiting buffer per kernel is tracked in #1461 notably this piece of code has no problem running now ``` from tinygrad.tensor...

closing as stale - on master both interpreted and compiled backend honored the middle downcast

closing as stale. `PYOPENCL_COMPILER_OUTPUT=1 DEBUG=3 CPU=0 OPT=3 python -m pytest -s -v -n=auto test/test_ops.py::TestOps::test_ceil -n=1` passed on master, and PUSH_PERMUTES, PUSH_CONTIGUOUS were removed

i can run `METAL=1 MODEL=mrcnn python3 examples/mlperf/model_eval.py` on master. while we don't have general per kernel buffer limiting yet, the `OPT` options discussed here have been deprecated, so i am...

closing as obsolete. extra/utils.py has been refactored and removed

I did not. I just ran `test/external/fuzz_symbolic.py` for 10k times and it all matched.

METAL test behaves differently on my local M1 vs CI. Might be related to other Metal issue. WebGPU fails because it's using METAL backend. I marked this as draft now...

I have a better way to improve this and I want to separate (1) enable JIT test in CI for all backends, (2) improve JIT tests (currently some tests still...