Inference speed is too slow on linux

Open ImagineMiracle-wxn opened this issue 9 months ago • 3 comments

The same model is on Ollama

Mar 18 '25 03:03 ImagineMiracle-wxn

Regardless of the operating system, under the premise of sufficient memory, the more machines you have, the slower it will run. This is a design flaw

Mar 19 '25 02:03 cgoxopx

Regardless of the operating system, under the premise of sufficient memory, the more machines you have, the slower it will run. This is a design flaw

But it is only 0.2 token/s, which is far inferior to other architectures.🫠

Mar 24 '25 03:03 ImagineMiracle-wxn

three nodes of 240 TFLOPS, cannot answer Hello

Mar 25 '25 12:03 jli113