Paul Tardy

Results 34 comments of Paul Tardy

I think that in the translation server loading/unloading and even running a model was **not** thread safe. I don't know anything about CTranslate 2 tho, so I can't tell how...

It's running llama2 model on Colab with 1x V100 GPU following https://github.com/ollama/ollama/blob/09a6f76f4c30fb8a9708680c519d08feeb504197/examples/jupyter-notebook/ollama.ipynb

I understand that CUDA should not be considered deterministic by default, therefore I would not bother to find small discrepancies from one run to another. On the other hand, it...

@MichaelFomenko the server uses model `llama2` and runs on colab following this example https://github.com/ollama/ollama/blob/09a6f76f4c30fb8a9708680c519d08feeb504197/examples/jupyter-notebook/ollama.ipynb The call itself to generate uses defaults values `outputs = [client.generate("llama2", "Why is the sky blue?")["response"]...