chenyu comments

Results 33 comments of


                                            chenyu

SymbolicShapeTracker

@wozeparrot this is done

Running inference for big models like stable_diffusion/LLaMa in float16/int8/int4

closing this as complete, it was before all dtype stuff. we support float16, int8, nf4 for llama now (no int8 tc though). and it's straightforward to add more

Limit the number of ops which can be evaluated lazily

closing as stale. The lazy has changed a lot. Limiting buffer per kernel is tracked in #1461 notably this piece of code has no problem running now ``` from tinygrad.tensor...

Dtype casting behavior differs between runtimes

closing as stale - on master both interpreted and compiled backend honored the middle downcast

opencl test_ceil / floor failing

closing as stale. `PYOPENCL_COMPILER_OUTPUT=1 DEBUG=3 CPU=0 OPT=3 python -m pytest -s -v -n=auto test/test_ops.py::TestOps::test_ceil -n=1` passed on master, and PUSH_PERMUTES, PUSH_CONTIGUOUS were removed

Mask R-CNN fails when running model_eval.py on Metal backend

i can run `METAL=1 MODEL=mrcnn python3 examples/mlperf/model_eval.py` on master. while we don't have general per kernel buffer limiting yet, the `OPT` options discussed here have been deprecated, so i am...

[Bug] load_single_weight in extra/utils.py throws an AttributeError

closing as obsolete. extra/utils.py has been refactored and removed

Simplify symbolic.SumNode.floordiv logic

I did not. I just ran `test/external/fuzz_symbolic.py` for 10k times and it all matched.

Enable JIT test for METAL. Add METAL_NO_FAST_MATH flag.

METAL test behaves differently on my local M1 vs CI. Might be related to other Metal issue. WebGPU fails because it's using METAL backend. I marked this as draft now...

Enable JIT test for METAL. Add METAL_NO_FAST_MATH flag.

I have a better way to improve this and I want to separate (1) enable JIT test in CI for all backends, (2) improve JIT tests (currently some tests still...