flashinfer Basic inference example for LLama/Mistral

Basic inference example for LLama/Mistral

Open vgoklani opened this issue 1 year ago • 3 comments

Hey there,

Thanks for sharing your library!

Is there a basic Llama/Mistral example implemented that we could read through?

I'd like to test the inference code on the Mistral 7B reference implementation. Thanks!

Feb 05 '24 21:02 vgoklani

Hi @vgoklani , good idea and I'm thinking about a minimal end-to-end example (~500 loc), please stay tuned :)

Feb 07 '24 08:02 yzh119

Thanks! Something using nanoGPT (framework independent) would be great!

Feb 07 '24 08:02 vgoklani

any update on this?

Mar 22 '24 11:03 Manojbhat09