exo icon indicating copy to clipboard operation
exo copied to clipboard

Inference speed is too slow on linux

Open ImagineMiracle-wxn opened this issue 9 months ago • 3 comments

Image

The same model is on Ollama

Image

ImagineMiracle-wxn avatar Mar 18 '25 03:03 ImagineMiracle-wxn

Regardless of the operating system, under the premise of sufficient memory, the more machines you have, the slower it will run. This is a design flaw

cgoxopx avatar Mar 19 '25 02:03 cgoxopx

Regardless of the operating system, under the premise of sufficient memory, the more machines you have, the slower it will run. This is a design flaw

But it is only 0.2 token/s, which is far inferior to other architectures.🫠

ImagineMiracle-wxn avatar Mar 24 '25 03:03 ImagineMiracle-wxn

three nodes of 240 TFLOPS, cannot answer Hello

Image

jli113 avatar Mar 25 '25 12:03 jli113