Tinygrad Inference Router
Add logic to be able to support building different model types in tinygrad inference
~~I can move the build_llama into llama.py and rename the function to build_model. This will make the if/else and raise not necessary. lmk how I should approach this~~
I'm dumb, still will need if else for loading the function in
@AlexCheema is this proposal fine, or is there a better way to do this?
Ideally we do this like #139 where we support models automatically without needing separate build_{model} functions.
Ok, I'll close this pr, do the LLaVa tinygrad implementation pr with this kind of logic, and then put up another pr that refactors the inference logic to used the stateful_shards.