Ollama Support
is it possible to use llama3 via ollama rather than huggingface one?
Not possible at the moment, but should be straightforward to implement if you'd like to give it a shot
You can check LlamaCpp docs from Guidance and change (preferably parametrize) the config:
https://github.com/jehna/humanify/blob/main/local-inference/guidance_config.py
I wonder if it's worth implementing a wrapper/abstraction layer like LiteLLM to make things more flexible?
- https://github.com/BerriAI/litellm
-
Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs)
-
This is what projects like aider use:
- https://aider.chat/docs/llms/other.html
-
Aider uses the litellm package to connect to hundreds of other models. You can use aider
--model <model-name>to use any supported model.To explore the list of supported models you can run
aider --models <model-name>with a partial model name. If the supplied name is not an exact match for a known model, aider will return a list of possible matching models.
-
Though I'm not currently sure if/how compatible that is with the guidance module you're currently using:
- https://github.com/guidance-ai/guidance
-
A guidance language for controlling large language models.
- https://github.com/guidance-ai/guidance#loading-models
-
Loading models
-
-
See also:
- https://github.com/guidance-ai/guidance/issues/599
- https://github.com/guidance-ai/guidance/issues/687
- https://github.com/guidance-ai/guidance/issues/648
- https://github.com/guidance-ai/guidance/issues/815
-
If you look here you can find that it can use LiteLLM, which itself can work with Ollama or some kind popular of remote LLM service
- https://github.com/guidance-ai/guidance/blob/main/guidance/models/_lite_llm.py
-
@jehna Curious, what aspects of guidance does humanify currently rely on? Is it using much of the deeper 'controls' provided by it?
Skimming the following prompt files:
- https://github.com/jehna/humanify/blob/main/local-inference/define.py
- https://github.com/jehna/humanify/blob/main/local-inference/rename.py
It looks like gen, stop and stop_regex are used:
- https://github.com/guidance-ai/guidance#basic-generation
- https://github.com/guidance-ai/guidance#regular-expressions
There's now v2 that runs on top of llama.cpp, so adding llama3 support should be even more straightforward.
@0xrsydn which version of llama3 were you planning to run? I could add it in to the new version
There's now v2 that runs on top of llama.cpp, so adding llama3 support should be even more straightforward.
@0xrsydn which version of llama3 were you planning to run? I could add it in to the new version
i think the recent one (llama3.1 8b) is great. Thanks btw
I researched a bit about Ollama. If I'm correct, you could run Ollama locally and Humanify could connect to its API to use any model that Ollama uses.
There seems to be an undocumented feature that allows passing GBNF grammars as an argument to the model: https://github.com/ollama/ollama/issues/3616#issuecomment-2068195083
...but judging from other open issues about the topic I'm not really sure if it works or not. But I'll give it a try!
...but judging from other open issues about the topic I'm not really sure if it works or not
This seems like it's a good overarching/summarising issue; still doesn't provide full clarity yet, but links to seemingly all the related issues, and points out that now that OpenAI supports it, it's sort of become a higher priority:
- https://github.com/ollama/ollama/issues/6237
Based on my read of these:
There are a few open PRs for this behaviour - the most recent one being https://github.com/ollama/ollama/pull/3618 it would be amazing to get this merged in. It's a 2 line change that exposes the llama.cpp GBNF functionality via modelfile parameters. Its not my patch but I've compiled it and used it locally and it works really well.
Originally posted by @ravenscroftj in https://github.com/ollama/ollama/issues/3616#issuecomment-2053939297
Yes you can send the grammar as an option when you submit a request with the patch I linked to above enabled. It just isn't documented!
Originally posted by @ravenscroftj in https://github.com/ollama/ollama/issues/3616#issuecomment-2068195083
It sounds like it's not currently possible to use the GBNF functionality on the current main/released version of Ollama.
According to this:
My simple personal example is this. As a newer Ollama user I actually would like to try out both approaches to see which one works better for me and my product. Right now in Ollama I simply cannot, and from appearances (which can be deceiving) it appears that what's stopping me from testing these both in Ollama is a simple code change to expose the feature in llama.cpp to me. (edit: It was brought to my attention that Ollama actually uses GBNF internally to enforce json syntax, so the only thing that's really missing is exposing this feature to the end user to customize or use different grammar.)
Originally posted by @Kinglord in https://github.com/ollama/ollama/issues/6237#issue-2453913814
It sounds like ollama currently supports JSON mode, and that is built as a GBNF grammar (presumably on top of llama.cpp's support of it), but that the ability to use a custom grammar isn't currently exposed to the end user.
...but judging from other open issues about the topic I'm not really sure if it works or not
This seems like it's a good overarching/summarising issue; still doesn't provide full clarity yet, but links to seemingly all the related issues, and points out that now that OpenAI supports it, it's sort of become a higher priority:
* [Ollama Product Stance on Grammar Feature / Outstanding PRs ollama/ollama#6237](https://github.com/ollama/ollama/issues/6237)Based on my read of these:
There are a few open PRs for this behaviour - the most recent one being ollama/ollama#3618 it would be amazing to get this merged in. It's a 2 line change that exposes the llama.cpp GBNF functionality via modelfile parameters. Its not my patch but I've compiled it and used it locally and it works really well. Originally posted by @ravenscroftj in ollama/ollama#3616 (comment)
Yes you can send the grammar as an option when you submit a request with the patch I linked to above enabled. It just isn't documented! Originally posted by @ravenscroftj in ollama/ollama#3616 (comment)
It sounds like it's not currently possible to use the GBNF functionality on the current main/released version of Ollama.
According to this:
My simple personal example is this. As a newer Ollama user I actually would like to try out both approaches to see which one works better for me and my product. Right now in Ollama I simply cannot, and from appearances (which can be deceiving) it appears that what's stopping me from testing these both in Ollama is a simple code change to expose the feature in llama.cpp to me. (edit: It was brought to my attention that Ollama actually uses GBNF internally to enforce json syntax, so the only thing that's really missing is exposing this feature to the end user to customize or use different grammar.) Originally posted by @Kinglord in ollama/ollama#6237 (comment)
It sounds like ollama currently supports JSON mode, and that is built as a GBNF grammar (presumably on top of llama.cpp's support of it), but that the ability to use a custom grammar isn't currently exposed to the end user.
Sadly @0xdevalias is correct and what you want to do @jehna will not work unless you patch your version of Ollama with the PR that was linked above, the release version still has no support for GBNF outside of the built in json mode. Ollama still refuses to even reply to this issue for some really strange reason, I still have no idea why they won't talk about it all and simply let the PRs keep rolling in and sit there. At this point all we can do is keep pressuring them by raising issues, making noise both here and on the Discord until we can get someone to take 10 minutes and explain to us this decision to essentially completely block this feature from end users in Ollama.
Thank you for looking into this. I just pushed ollama-support branch that should start working if they start supporting the grammar flag
☝️ added llama3.1 8b model support
How do we use ollama with this sorry if this is a dumb question.
@dangelo352 unfortunately there's no Ollama support yet. You can run the model locally using humanify local
Since v2.2.0 there's now a configurable --baseURL parameter at the OpenAI mode. Unfortunately Ollama does not yet support structured outputs, although I'm sure it's on their roadmap as the official OpenAI API supports it now.
Please keep an eye out for ollama/ollama#6473, as soon as they add the support and close that issue, you should be able to use Ollama as humanity backend by:
ollama load pull phi3.5 # or some other compatible model
humanify openai -k 'not-needed' --baseURL 'http://localhost:11434/v1' minified-file.js -m phi3.5
Please keep an eye out for ollama/ollama#6473, as soon as they add the support and close that issue, you should be able to use Ollama as humanity backend
That issue is closed now:
This will get rolled out with https://github.com/ollama/ollama/issues/7900!
Originally posted by @ParthSareen in https://github.com/ollama/ollama/issues/6473#issuecomment-2518850099
There is a more detailed note in this related issue:
Hey everyone!
With the merging of #7900, we're introducing structured output to be able to go from a json schema to structured generation! Really appreciate all the feedback and contributions. Extremely thankful for all of you being so involved in this 🙏🏽
There are a few things we're still keeping in mind over the next few months. The first focus is going to be around performance - speed and accuracy. There has been a lot of research coming out around this, we're keeping a close eye and are going to see how we can integrate some of this into Ollama. We're also thinking about how to support structured generation in the long term and that'll play nicely with a lot of the work we're doing on our new engine.
Stoked for the coming few months, hope to improve both performance and accuracy around sampling and constrained decoding.
Thank you again for your patience, we're super excited to get this out in an upcoming release! Will spin out more issues around this as well - happy to keep you all posted as well!
Originally posted by @ParthSareen in https://github.com/ollama/ollama/issues/6237#issuecomment-2518836758
Looking at the issue it links to:
- https://github.com/ollama/ollama/pull/7900
-
Structured Outputs - Chat Endpoint
-
Structured outputs
A longtime ask from the community - we now support the passing in of a json schema, translate to grammar and use it for sampling.
Why not full grammar support
We gave this a ton of thought and there's 3 main points around here:
- Inherent complexity of grammars - Generating a grammar for the average user is not a great experience and should be one that is abstracted away from them. Digging into the code, the API layer also needs some TLC which would mean changing some interfaces on Ollama's end while maintaining a consistent UX.
- Sampling performance - there are many new papers and methodologies for grammars (outlines, xgrammar, etc). We want to keep grammar generation and sampling coupled to improve the performance of sampling down the road.
- Parity with existing experiences - other client SDKs (e.g. OpenAI) already support structured outputs and it's imperative we keep the experience simple on our end but also support those.
Originally posted by @ParthSareen in https://github.com/ollama/ollama/pull/7900#issue-2708318412
-
This pull request first appeared in v0.5.0-rc1
- https://github.com/ollama/ollama/releases/tag/v0.5.0
-
Structured outputs
Ollama now supports structured outputs, making it possible to constrain a model's output to a specific format defined by a JSON schema. The Ollama Python and JavaScript libraries have been updated to support structured outputs, together with Ollama's OpenAI-compatible API endpoints.
REST API
To use structured outputs in Ollama's generate or chat APIs, provide a JSON schema object in the format parameter:
..snip..
JavaScript library
Using the Ollama JavaScript library, pass in the schema as a JSON object to the
formatparameter as eitherobjector use Zod (recommended) to serialize the schema usingzodToJsonSchema():..snip..
-
- https://github.com/ollama/ollama/releases/tag/v0.5.1
-
Fixed issue where Ollama's API would generate JSON output when specifying
"format": null
-
- https://github.com/ollama/ollama/releases/tag/v0.5.3
-
Fixed issue where setting the
formatfield to""would cause an error
-
- https://github.com/ollama/ollama/releases/tag/v0.5.0
-
I'm not sure if that is sufficient for humanify to work with ollama; and if it is, whether it will work as-is or if something more needs to be done on this side to make it work. @jehna would probably be the best to know that off-hand.
While I haven't looked into it deeper, I noticed that there is an Ollama JS SDK, so if we wanted to build more specific support into humanify, perhaps that would be a good place to start:
- https://github.com/ollama/ollama-js
-
Ollama JavaScript Library The Ollama JavaScript library provides the easiest way to integrate your JavaScript project with Ollama.
-
See also:
- https://github.com/jehna/humanify/issues/400
- https://github.com/jehna/humanify/issues/84
- https://github.com/jehna/humanify/issues/416
- https://github.com/jehna/humanify/issues/419
- https://github.com/jehna/humanify/pull/646
-
This adds the
ollamacommand that can be used and uses the default modelgpt-oss:20b
-
- https://github.com/jehna/humanify/pull/647
- https://ollama.com/blog/cloud-models
-
Cloud models (September 19, 2025)
-
Cloud models are now in preview, letting you run larger models with fast, datacenter-grade hardware. You can keep using your local tools while running larger models that wouldn’t fit on a personal computer. Ollama’s cloud does not retain your data to ensure privacy and security.
The same Ollama experience is now seamless across both local and in the cloud, integrating with the existing tools you already use. Ollama’s cloud models also work via Ollama’s OpenAI-compatible API.
- https://ollama.com/cloud
-
Run larger models, faster using Ollama's cloud: $20/mo
-
What are the usage limits for Ollama's cloud?
Ollama's cloud includes hourly and daily limits to avoid capacity issues. Usage-based pricing will soon be available to consume models in a metered fashion.
-
-