Eric Buehler
Eric Buehler
@igo, I think you may get this error if the token source is not set up correctly. For a gated model such as Llama 3 that would cause the error.
Refs #222.
@igo, do you have a HF token set in your cache? Mistral requires an HF token, so if you set it to 'none' it will not work.
@igo, #225 fixed #223 which looks similar. On my machine, the examples you gave work. Does it work for you? Regarding this command: ``` ./mistralrs_server --port 1234 lora-gguf -o orderings/xlora-paper-ordering.json...
> Regarding that command, I copied it from README so I would expect it should work. Ah, sorry. I will fix it. > It does not compile anymore: It should...
Can you please run `cargo update` and `git pull`? There was a new variant `TensorF16` introduced on our Candle fork a few days ago.
Great! I'm glad that it works. > Although it's much much slow compared to Ollama's phi3 (maybe because of different quantization). Are you comparing against Ollama quantized phi3? In that...
@igo, I'm just closing this issue as I think the problems are resolved. However please feel free to reopen!
@lucasavila00, thank you. Can you please test the Python bindings? It also seems like there are some more merge conflicts.
@lucasavila00 thank you!