reddyn12
reddyn12
I fixed the small-prompt bug. I'm looking over #3600 and I can't build on what @chenyuxyz discovered.
The CLANG bug is fixed. However the .copyin() method in ops_cuda was changed from cuda.cuMemcpyHtoD_v2 to cuda.cuMemcpyHtoDAsync_v2 which breaks the load_state_dict() in nn.state. Can you confirm this breaks on multi...
issue inherited with PYTHON
~~I can move the ```build_llama``` into ```llama.py``` and rename the function to ```build_model```. This will make the ```if/else and raise``` not necessary. lmk how I should approach this~~ I'm dumb,...
@AlexCheema is this proposal fine, or is there a better way to do this?
Ok, I'll close this pr, do the LLaVa tinygrad implementation pr with this kind of logic, and then put up another pr that refactors the inference logic to used the...
Got 25.6 sec on M3 Max
I can do this
Looks like @varshith15 got it. Ill go back to tinygrad llava
@AlexCheema I can implement LLaVa in tinygrad