joshpopelka20gmail
joshpopelka20gmail
For my project, I'm using an embedding model for clinical text. The model is available on huggingface, but I don't see a crate for huggingface embeddings. Is it possible to...
For my use case, I've been informed that prefix caching may help me reduce inference time (I'm working on an internal web service). Looking through the codebase, I see that...
Hope you can help with this. I'm trying to implement ring attention using Llama 3 architecture and I'm starting with the blockwise parallel transformer piece. My question is when do...