humanify Ollama Support

is it possible to use llama3 via ollama rather than huggingface one?

Jun 18 '24 03:06 0xrsydn

Not possible at the moment, but should be straightforward to implement if you'd like to give it a shot

You can check LlamaCpp docs from Guidance and change (preferably parametrize) the config: https://github.com/jehna/humanify/blob/main/local-inference/guidance_config.py

Jun 19 '24 16:06 jehna

I wonder if it's worth implementing a wrapper/abstraction layer like LiteLLM to make things more flexible?

https://github.com/BerriAI/litellm
- Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs)

This is what projects like aider use:

https://aider.chat/docs/llms/other.html
- Aider uses the litellm package to connect to hundreds of other models. You can use aider --model <model-name> to use any supported model.
  
  To explore the list of supported models you can run aider --models <model-name> with a partial model name. If the supplied name is not an exact match for a known model, aider will return a list of possible matching models.

Though I'm not currently sure if/how compatible that is with the guidance module you're currently using:

https://github.com/guidance-ai/guidance
- A guidance language for controlling large language models.
- https://github.com/guidance-ai/guidance#loading-models
  - Loading models

See also:

https://github.com/guidance-ai/guidance/issues/599
https://github.com/guidance-ai/guidance/issues/687
https://github.com/guidance-ai/guidance/issues/648
https://github.com/guidance-ai/guidance/issues/815
- If you look here you can find that it can use LiteLLM, which itself can work with Ollama or some kind popular of remote LLM service
  - https://github.com/guidance-ai/guidance/blob/main/guidance/models/_lite_llm.py

Jun 19 '24 23:06 0xdevalias

@jehna Curious, what aspects of guidance does humanify currently rely on? Is it using much of the deeper 'controls' provided by it?

Skimming the following prompt files:

https://github.com/jehna/humanify/blob/main/local-inference/define.py
https://github.com/jehna/humanify/blob/main/local-inference/rename.py

It looks like gen, stop and stop_regex are used:

https://github.com/guidance-ai/guidance#basic-generation
https://github.com/guidance-ai/guidance#regular-expressions

Jun 20 '24 00:06 0xdevalias

There's now v2 that runs on top of llama.cpp, so adding llama3 support should be even more straightforward.

@0xrsydn which version of llama3 were you planning to run? I could add it in to the new version

Aug 12 '24 19:08 jehna

There's now v2 that runs on top of llama.cpp, so adding llama3 support should be even more straightforward.

@0xrsydn which version of llama3 were you planning to run? I could add it in to the new version

i think the recent one (llama3.1 8b) is great. Thanks btw

Aug 13 '24 01:08 0xrsydn

I researched a bit about Ollama. If I'm correct, you could run Ollama locally and Humanify could connect to its API to use any model that Ollama uses.

There seems to be an undocumented feature that allows passing GBNF grammars as an argument to the model: https://github.com/ollama/ollama/issues/3616#issuecomment-2068195083

...but judging from other open issues about the topic I'm not really sure if it works or not. But I'll give it a try!

Aug 14 '24 14:08 jehna

...but judging from other open issues about the topic I'm not really sure if it works or not

This seems like it's a good overarching/summarising issue; still doesn't provide full clarity yet, but links to seemingly all the related issues, and points out that now that OpenAI supports it, it's sort of become a higher priority:

https://github.com/ollama/ollama/issues/6237

Based on my read of these:

There are a few open PRs for this behaviour - the most recent one being https://github.com/ollama/ollama/pull/3618 it would be amazing to get this merged in. It's a 2 line change that exposes the llama.cpp GBNF functionality via modelfile parameters. Its not my patch but I've compiled it and used it locally and it works really well.

Originally posted by @ravenscroftj in https://github.com/ollama/ollama/issues/3616#issuecomment-2053939297

Yes you can send the grammar as an option when you submit a request with the patch I linked to above enabled. It just isn't documented!

Originally posted by @ravenscroftj in https://github.com/ollama/ollama/issues/3616#issuecomment-2068195083

It sounds like it's not currently possible to use the GBNF functionality on the current main/released version of Ollama.

According to this:

My simple personal example is this. As a newer Ollama user I actually would like to try out both approaches to see which one works better for me and my product. Right now in Ollama I simply cannot, and from appearances (which can be deceiving) it appears that what's stopping me from testing these both in Ollama is a simple code change to expose the feature in llama.cpp to me. (edit: It was brought to my attention that Ollama actually uses GBNF internally to enforce json syntax, so the only thing that's really missing is exposing this feature to the end user to customize or use different grammar.)

Originally posted by @Kinglord in https://github.com/ollama/ollama/issues/6237#issue-2453913814

It sounds like ollama currently supports JSON mode, and that is built as a GBNF grammar (presumably on top of llama.cpp's support of it), but that the ability to use a custom grammar isn't currently exposed to the end user.

Aug 15 '24 03:08 0xdevalias

...but judging from other open issues about the topic I'm not really sure if it works or not

This seems like it's a good overarching/summarising issue; still doesn't provide full clarity yet, but links to seemingly all the related issues, and points out that now that OpenAI supports it, it's sort of become a higher priority:
* [Ollama Product Stance on Grammar Feature / Outstanding PRs ollama/ollama#6237](https://github.com/ollama/ollama/issues/6237)
Based on my read of these:

There are a few open PRs for this behaviour - the most recent one being ollama/ollama#3618 it would be amazing to get this merged in. It's a 2 line change that exposes the llama.cpp GBNF functionality via modelfile parameters. Its not my patch but I've compiled it and used it locally and it works really well. Originally posted by @ravenscroftj in ollama/ollama#3616 (comment)

Yes you can send the grammar as an option when you submit a request with the patch I linked to above enabled. It just isn't documented! Originally posted by @ravenscroftj in ollama/ollama#3616 (comment)

It sounds like it's not currently possible to use the GBNF functionality on the current main/released version of Ollama.

According to this:

My simple personal example is this. As a newer Ollama user I actually would like to try out both approaches to see which one works better for me and my product. Right now in Ollama I simply cannot, and from appearances (which can be deceiving) it appears that what's stopping me from testing these both in Ollama is a simple code change to expose the feature in llama.cpp to me. (edit: It was brought to my attention that Ollama actually uses GBNF internally to enforce json syntax, so the only thing that's really missing is exposing this feature to the end user to customize or use different grammar.) Originally posted by @Kinglord in ollama/ollama#6237 (comment)

It sounds like ollama currently supports JSON mode, and that is built as a GBNF grammar (presumably on top of llama.cpp's support of it), but that the ability to use a custom grammar isn't currently exposed to the end user.

Sadly @0xdevalias is correct and what you want to do @jehna will not work unless you patch your version of Ollama with the PR that was linked above, the release version still has no support for GBNF outside of the built in json mode. Ollama still refuses to even reply to this issue for some really strange reason, I still have no idea why they won't talk about it all and simply let the PRs keep rolling in and sit there. At this point all we can do is keep pressuring them by raising issues, making noise both here and on the Discord until we can get someone to take 10 minutes and explain to us this decision to essentially completely block this feature from end users in Ollama.

Aug 15 '24 06:08 Kinglord

Thank you for looking into this. I just pushed ollama-support branch that should start working if they start supporting the grammar flag

Aug 15 '24 16:08 jehna

☝️ added llama3.1 8b model support

Aug 15 '24 20:08 jehna

How do we use ollama with this sorry if this is a dumb question.

Sep 29 '24 09:09 dangelo352

@dangelo352 unfortunately there's no Ollama support yet. You can run the model locally using humanify local

Oct 18 '24 20:10 jehna

Since v2.2.0 there's now a configurable --baseURL parameter at the OpenAI mode. Unfortunately Ollama does not yet support structured outputs, although I'm sure it's on their roadmap as the official OpenAI API supports it now.

Please keep an eye out for ollama/ollama#6473, as soon as they add the support and close that issue, you should be able to use Ollama as humanity backend by:

ollama load pull phi3.5 # or some other compatible model
humanify openai -k 'not-needed' --baseURL 'http://localhost:11434/v1' minified-file.js -m phi3.5

Oct 18 '24 20:10 jehna

Please keep an eye out for ollama/ollama#6473, as soon as they add the support and close that issue, you should be able to use Ollama as humanity backend

That issue is closed now:

This will get rolled out with https://github.com/ollama/ollama/issues/7900!

Originally posted by @ParthSareen in https://github.com/ollama/ollama/issues/6473#issuecomment-2518850099

There is a more detailed note in this related issue:

Hey everyone!

With the merging of #7900, we're introducing structured output to be able to go from a json schema to structured generation! Really appreciate all the feedback and contributions. Extremely thankful for all of you being so involved in this 🙏🏽

There are a few things we're still keeping in mind over the next few months. The first focus is going to be around performance - speed and accuracy. There has been a lot of research coming out around this, we're keeping a close eye and are going to see how we can integrate some of this into Ollama. We're also thinking about how to support structured generation in the long term and that'll play nicely with a lot of the work we're doing on our new engine.

Stoked for the coming few months, hope to improve both performance and accuracy around sampling and constrained decoding.

Thank you again for your patience, we're super excited to get this out in an upcoming release! Will spin out more issues around this as well - happy to keep you all posted as well!

Originally posted by @ParthSareen in https://github.com/ollama/ollama/issues/6237#issuecomment-2518836758

Looking at the issue it links to:

https://github.com/ollama/ollama/pull/7900
- Structured Outputs - Chat Endpoint
- Structured outputs
  
  A longtime ask from the community - we now support the passing in of a json schema, translate to grammar and use it for sampling.
  
  Why not full grammar support
  
  We gave this a ton of thought and there's 3 main points around here:
  1. Inherent complexity of grammars - Generating a grammar for the average user is not a great experience and should be one that is abstracted away from them. Digging into the code, the API layer also needs some TLC which would mean changing some interfaces on Ollama's end while maintaining a consistent UX.
  2. Sampling performance - there are many new papers and methodologies for grammars (outlines, xgrammar, etc). We want to keep grammar generation and sampling coupled to improve the performance of sampling down the road.
  3. Parity with existing experiences - other client SDKs (e.g. OpenAI) already support structured outputs and it's imperative we keep the experience simple on our end but also support those.
  Originally posted by @ParthSareen in https://github.com/ollama/ollama/pull/7900#issue-2708318412
- This pull request first appeared in v0.5.0-rc1
  - https://github.com/ollama/ollama/releases/tag/v0.5.0
    - Structured outputs
      
      Ollama now supports structured outputs, making it possible to constrain a model's output to a specific format defined by a JSON schema. The Ollama Python and JavaScript libraries have been updated to support structured outputs, together with Ollama's OpenAI-compatible API endpoints.
      
      REST API
      
      To use structured outputs in Ollama's generate or chat APIs, provide a JSON schema object in the format parameter:
      
      ..snip..
      
      JavaScript library
      
      Using the Ollama JavaScript library, pass in the schema as a JSON object to the format parameter as either object or use Zod (recommended) to serialize the schema using zodToJsonSchema():
      
      ..snip..
  - https://github.com/ollama/ollama/releases/tag/v0.5.1
    - Fixed issue where Ollama's API would generate JSON output when specifying "format": null
  - https://github.com/ollama/ollama/releases/tag/v0.5.3
    - Fixed issue where setting the format field to "" would cause an error

I'm not sure if that is sufficient for humanify to work with ollama; and if it is, whether it will work as-is or if something more needs to be done on this side to make it work. @jehna would probably be the best to know that off-hand.

While I haven't looked into it deeper, I noticed that there is an Ollama JS SDK, so if we wanted to build more specific support into humanify, perhaps that would be a good place to start:

https://github.com/ollama/ollama-js
- Ollama JavaScript Library The Ollama JavaScript library provides the easiest way to integrate your JavaScript project with Ollama.

Feb 18 '25 09:02 0xdevalias

Ollama Support

Structured outputs

Why not full grammar support

Structured outputs

REST API

JavaScript library