joshpopelka20gmail issues

Repositories
Issues
Comments

Results 3 issues of


                                            joshpopelka20gmail

Add support for huggingface embeddings

For my project, I'm using an embedding model for clinical text. The model is available on huggingface, but I don't see a crate for huggingface embeddings. Is it possible to...

Enabling prefix cache for llama3 gguf

For my use case, I've been informed that prefix caching may help me reduce inference time (I'm working on an internal web service). Looking through the codebase, I see that...

Llama 3 ring attention implementation for inference

Hope you can help with this. I'm trying to implement ring attention using Llama 3 architecture and I'm starting with the blockwise parallel transformer piece. My question is when do...