reddyn12 comments

Results 32 comments of


                                            reddyn12

Mamba Implementation

I fixed the small-prompt bug. I'm looking over #3600 and I can't build on what @chenyuxyz discovered.

The CLANG bug is fixed. However the .copyin() method in ops_cuda was changed from cuda.cuMemcpyHtoD_v2 to cuda.cuMemcpyHtoDAsync_v2 which breaks the load_state_dict() in nn.state. Can you confirm this breaks on multi...

.softmax().argmax() broken in CLANG

issue inherited with PYTHON

Tinygrad Inference Router

~~I can move the ```build_llama``` into ```llama.py``` and rename the function to ```build_model```. This will make the ```if/else and raise``` not necessary. lmk how I should approach this~~ I'm dumb,...

Tinygrad Inference Router

@AlexCheema is this proposal fine, or is there a better way to do this?

Tinygrad Inference Router

Ok, I'll close this pr, do the LLaVa tinygrad implementation pr with this kind of logic, and then put up another pr that refactors the inference logic to used the...

Flux.1

Got 25.6 sec on M3 Max

[BOUNTY - $100] Vision Model Integration Test

I can do this

[BOUNTY - $100] Vision Model Integration Test

Looks like @varshith15 got it. Ill go back to tinygrad llava

[BOUNTY - $100] Add support for LLaVA (tinygrad)

@AlexCheema I can implement LLaVa in tinygrad