ai-examples
ai-examples copied to clipboard
Update Spin SDK in AI examples
E.g. the Rust SDK is currently on v1.5.
@itowlson is it possible to use a local llm instead of using spin cloud gpu? , because i might run out of free tokens before i can demo my app lol
@codeitlikemiley Yes, local is the default. The guide for setting up local LLMs is at https://spinframework.dev/v3/serverless-ai-api-guide#file-structure.
Please allow plenty of testing time. Local LLM is not a feature that gets much love at the moment, because local use can be punishingly slow (I believe the default build of Spin doesn't use the GPU for nasty reasons relating to portability), and what love it does get tends to be limited in terms of what models we test with. So you'll definitely want to check that your models run okay locally. If speed is unacceptable, you may want to try building Spin from source with the llm-metal or llm-cublas feature flag and see if that makes a difference.
If you can't make it perform acceptably with local LLM, please ping me and I'll see if we can reset/extend your token limit for the demo (but do give us as much lead time as you can, and I can't promise anything).
Thanks for reply i did eventually found out about that link , yes the local llm is slow, for inference but the embeddings is okay its fast on local
For the local llm i think i have different usecase as the current llma model is not what i wanted to use but gemma 3 270m , i have to find a work around on this by making my own service separate from spin llm
i saw the llm-metal here but yeah i would probably need to review the whole code https://github.com/spinframework/spin/blob/main/Cargo.toml
What i want is to plug any model with spin , and if possible use spin kube to deploy with kubernetes exposing gpus/llm service callable to spin....
I might need to tinker on that, but if you have a lead info , that would be great :) i would really appreciate it
The only thing I can suggest offhand is to write your own k8s service that implements the cloud-gpu HTTP API (https://github.com/fermyon/spin-cloud-gpu/blob/main/fermyon-cloud-gpu/src/index.ts#L57 ff), then point at that in your runtime-config.toml (or SpinKube equivalent). cloud-gpu uses a very very simple API, and as far as I know there's nothing in the API that binds it to Fermyon Cloud.
(To be clear: the cloud-gpu plugin, and the Fermyon server-side implemetnation, are bound to Fermyon Cloud. But the API is not, so Spin would not care if you pointed it at your own implementation on your own k8s cluster.)
cc @karthik2804 as the cloud-gpu expert
(If it helps, I believe @seun-ja is working on an OpenAI API backend for Spin; I've no idea if that would be portable to a self-hosted service in k8s though.)
thanks for the info really appreciate it, i will keep you guys posted once i started to thinker , i might made some PR here and there ... Thanks for all your works
As @itowlson points out, the spin-cloud-gpu is a very simple API and with the PR to add OpenAI API backend to Spin, it should make it possible to point at any compatible server. In the meantime, the approach I have been using recently is to use ollama to run the model of choice and then consume it in a typescript app as the npm ollama library just works. Here is an example - https://github.com/fermyon/fwf-examples/tree/main/samples/ai-sentiment-analysis-ollama.