Yixiang Gao

Results 14 comments of Yixiang Gao

@d07RiV I was looking into **itemset.js**, is the set not included in there yet?

Working on detr, a bit tricky. Will explain in the PR.

There's a [extra/lr_scheduler.py](https://github.com/geohot/tinygrad/blob/master/extra/lr_scheduler.py) class already implemented. Instead of writing a whole new optim.SGD class, I would just add the OneCycle LR there.

lol made a mistake

While looking into it, I think there's also a bug at [if view.contiguous: return new_view, False # NOTE: if it's contiguous it can't have an offset](https://github.com/tinygrad/tinygrad/blob/c5aea13a6503f07e93e46f345fbf52da3b54d28d/tinygrad/shape/shapetracker.py#L115C23-L115C29) where the mask index...

will have to make sure the fix pass all the test and correct logically, I don't believe its going to be a lot of lines but logic is a bit...

Was trying to add an offset to [buffer](https://github.com/tinygrad/tinygrad/blob/d0e21a7398ed08809ff77e6678cfb86b06906262/tinygrad/runtime/ops_cuda.py#L74) where the kernel was called but I couldn't do it in `pycuda` without making an copy. So I now will try to...

mypy considers `self.subprg` is `List[str]` which gives `not callable error`, therefore that line is ignored.

After some discussion in the discord, the global dim limit issue would also happen on WebGPU. However, could this potentially be a bug that [linearizer](https://github.com/tinygrad/tinygrad/blob/872e2198fe87374eb60c05bad7852bda77e2981a/tinygrad/codegen/linearizer.py#L581) did not push everything to...

Literally one additional batch size breaks it. ``` $ GPU=1 CUDA=1 BS=1023 STEPS=3 python examples/hlb_cifar10.py 0 4444.05 ms run, 4443.82 ms python, 0.23 ms CL, 2.94 loss, 0.007055 LR, 0.10...