flashinfer
flashinfer copied to clipboard
Basic inference example for LLama/Mistral
Hey there,
Thanks for sharing your library!
Is there a basic Llama/Mistral example implemented that we could read through?
I'd like to test the inference code on the Mistral 7B reference implementation. Thanks!
Hi @vgoklani , good idea and I'm thinking about a minimal end-to-end example (~500 loc), please stay tuned :)
Thanks! Something using nanoGPT (framework independent) would be great!
any update on this?