turnkeyml Add Linux NPU & GPU support to Lemonade Server

Adding a discussion issue here to get feedback from the community about what they would use Linux support for in conjunction with Lemonade Server.

Something that would help would be if people would comment with what their use case is, what hardware they are running, what models they are interested in, etc. Having this written here would give us some concrete targets to go after.

Apr 08 '25 21:04 jeremyfowers

At this moment, I'm using a ryzen cpu, 7900xtx, Ubuntu 24LTS, ROCm 6.3.3.60303-74~24.04, and whatever the latest versions of ollama and openwebui are. Docker does a nice job of keeping my machine from getting to cluttered.

models used? whatever seems interesting/useful for text processing (eg. code generation, document summaries, etc.), is easily imported into ollama, and fits in VRAM, eg.

codegemma
codellama
codestral
deepseek-r1
deepcoder
starcoder
qwen-2.5-coder
cogito
phi4
falcon3
gemma3
granite3.2
granite-code
dolphin-mixtral

Apr 11 '25 02:04 ckuethe

OpenAI Whisper - voice recognition model - not a particular heavy workload, I would love a Linux based local voice assistant that doesn't draw a lot of power.

Apr 19 '25 12:04 bogdanbiv

Native NPU support in Linux would be a game changer for us.

https://github.com/EricLBuehler/mistral.rs/issues/1254
https://github.com/amd/gaia/issues/9
https://github.com/ollama/ollama/issues/5186
https://github.com/AMD-AIG-AIMA/Instella/issues/1
https://github.com/ggml-org/llama.cpp/issues/1499

Apr 20 '25 22:04 GreyXor

I am currently working on an SME AI appliance and am exploring various hardware options, with the AMD AI 300 APU series being a hot contender. However, the prerequisite for this would be optimal LLM inference performance, as demonstrated by the OGA hybrid workflows.

Apr 25 '25 10:04 planatscher

FYI folks: Lemonade SDK has moved to a new repository, and I have re-opened this issue there: https://github.com/lemonade-sdk/lemonade/issues/5

May 16 '25 20:05 jeremyfowers