Eric Buehler

Results 543 comments of Eric Buehler

I was able to reproduce the error by running the following in quick succession. ```bash curl http://localhost:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer EMPTY" \ -d '{ "model":...

@lucasavila00, this looks great. It'll require modifying the attention mask calculation of every model, so it may be helpful to factor those out into a `layers.rs` in `mistralrs-core`.

@lucasavila00, I am actually going to end up adding this in #242.

Yes, I've been tracking that. I have merged the upstream changes now, so it should be faster.

Ah, that could be it. Looking forward to the Candle implementation, maybe we can author a PR.

I think the llama.cpp issue described performance regressions after BS=4.

I can add the specialized kernels on our branch, do you think that would be good? I wonder why llama.cpp moved from 8 to 4, 5370 did not specify a...

Refs https://github.com/huggingface/candle/pull/2077