exo
exo copied to clipboard
Inference speed is too slow on linux
The same model is on Ollama
Regardless of the operating system, under the premise of sufficient memory, the more machines you have, the slower it will run. This is a design flaw
Regardless of the operating system, under the premise of sufficient memory, the more machines you have, the slower it will run. This is a design flaw
But it is only 0.2 token/s, which is far inferior to other architectures.🫠
three nodes of 240 TFLOPS, cannot answer Hello