chenyu issues

Results 80 issues of


                                            chenyu

idx simplification given valid

see https://github.com/tinygrad/tinygrad/issues/1801

simple linear kernel not fusing

``` from tinygrad import Tensor, GlobalCounters a = Tensor.rand(3, 4).realize() b = Tensor.rand(3, 4).realize() w = Tensor([0.3]).realize() GlobalCounters.reset() (a+(b-a)*w).realize() print(f"(a+(b-a)*w) {GlobalCounters.kernel_count=}") GlobalCounters.reset() (a*(1-w)+b*w).realize() print(f"(a*(1-w)+b*w) {GlobalCounters.kernel_count=}") ``` ``` (a+(b-a)*w) GlobalCounters.kernel_count=1 (a*(1-w)+b*w)...

inflated ops count for `a-b` compared to `a+b`

ops not reflecting the underlying compute if underly instructions are fused ``` from tinygrad import Tensor, GlobalCounters a = Tensor.rand(3, 4).realize() b = Tensor.rand(3, 4).realize() GlobalCounters.reset() (a+b).realize() print(f"a+b {GlobalCounters.global_ops=}") GlobalCounters.reset()...

UOp rewrite max to CMPLT and WHERE

always use `(({a}>{b})?{a}:{b})` for cstyle max

fix max not defined for dtypes like uint8.

no UnaryOps.NEG in generated UOp patterns

removed pattern `x * (-1) -> -x` and `x != True` investigating why multireduce tests in `test_linearizer` are failing in _assert_valid_uop...

real strides with uops [run_process_replay]

image_dot of 2 half inputs returns in half instead of float

`PYTHONPATH="." GPU=1 IMAGE=0 python -m pytest test/test_ops.py -k test_gemm_fp16` passed `PYTHONPATH="." GPU=1 IMAGE=2 python -m pytest test/test_ops.py -k test_gemm_fp16` failed with `Exception: forward pass failed shape (64, 64): dtype mismatch:...

chenyu