Feature Request
Support for fully localized AI with ollama
Let me know if you want to collab.
You can do this!!
Create a ~/.code_puppy/extra_models.json
and put this in:
{
"qwen3-coder-30b": {
"type": "custom_openai",
"name": "Qwen3-Coder-30B-A3B-Instruct",
"custom_endpoint": {
"url": "http://localhost:11434",
},
"context_length": 256000
}
}
I have tested this and ollama api contract requires /v1. Try this.
{ "qwen3-coder-30b": { "type": "custom_openai", "name": "qwen3-coder:30b", "custom_endpoint": { "url": "http://localhost:11434/v1/" }, "context_length": 256000 } }
Also note though that ollama has a default context of 4,096 from what i can find. You can increase it by setting environment variable OLLAMA_CONTEXT_LENGTH. The higher you set it though the more vram you must have.
Interesting thank you!-TrevorOn Nov 24, 2025, at 2:08 PM, Andrew Tilson @.***> wrote:AndrewTilson left a comment (mpfaffenberger/code_puppy#112) I have tested this and ollama api contract requires /v1. Try this. { "qwen3-coder-30b": { "type": "custom_openai", "name": "qwen3-coder:30b", "custom_endpoint": { "url": "http://localhost:11434/v1/" }, "context_length": 256000 } } Also note though that ollama has a default context of 4,096 from what i can find. You can increase it by setting environment variable OLLAMA_CONTEXT_LENGTH. The higher you set it though the more vram you must have.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>
@tjdodson - I would highly recommend using LM Studio instead of Ollama. You'll have a much better experience and have faster / better inference.
Closing - as this is already supported