Add Linux NPU & GPU support to Lemonade Server
Adding a discussion issue here to get feedback from the community about what they would use Linux support for in conjunction with Lemonade Server.
Something that would help would be if people would comment with what their use case is, what hardware they are running, what models they are interested in, etc. Having this written here would give us some concrete targets to go after.
At this moment, I'm using a ryzen cpu, 7900xtx, Ubuntu 24LTS, ROCm 6.3.3.60303-74~24.04, and whatever the latest versions of ollama and openwebui are. Docker does a nice job of keeping my machine from getting to cluttered.
models used? whatever seems interesting/useful for text processing (eg. code generation, document summaries, etc.), is easily imported into ollama, and fits in VRAM, eg.
- codegemma
- codellama
- codestral
- deepseek-r1
- deepcoder
- starcoder
- qwen-2.5-coder
- cogito
- phi4
- falcon3
- gemma3
- granite3.2
- granite-code
- dolphin-mixtral
OpenAI Whisper - voice recognition model - not a particular heavy workload, I would love a Linux based local voice assistant that doesn't draw a lot of power.
Native NPU support in Linux would be a game changer for us.
- https://github.com/EricLBuehler/mistral.rs/issues/1254
- https://github.com/amd/gaia/issues/9
- https://github.com/ollama/ollama/issues/5186
- https://github.com/AMD-AIG-AIMA/Instella/issues/1
- https://github.com/ggml-org/llama.cpp/issues/1499
I am currently working on an SME AI appliance and am exploring various hardware options, with the AMD AI 300 APU series being a hot contender. However, the prerequisite for this would be optimal LLM inference performance, as demonstrated by the OGA hybrid workflows.
FYI folks: Lemonade SDK has moved to a new repository, and I have re-opened this issue there: https://github.com/lemonade-sdk/lemonade/issues/5