zed icon indicating copy to clipboard operation
zed copied to clipboard

GPT4All / 'raw' llama.cpp support

Open KhazAkar opened this issue 1 year ago • 2 comments

Check for existing issues

  • [X] Completed

Describe the feature

Currently, zed.dev supports ollama as provider, but it's not ideal for some configurations, because it does not support Vulkan yet (there's PR for it, but not yet merged) gpt4all.io supports running LLMs on GPU using Vulkan, which will speed things up. It also have local server endpoint available. If it's possible to configure it using existing configuration, would be great.

If applicable, add mockups / screenshots to help present your vision of the feature

Similar to ollama config:

  "assistant": {
    "version": "1",
    "provider": {
      "default_model": {
        "name": "name-of-model-file.gguf",
        "max_tokens": 2048,
        "keep_alive": -1
      },
      "name": "gpt4all" # or llama.cpp
    }
  },

This way, it would be possible to use 'raw' llama.cpp build as well, as gpt4all python bindings, which also have API endpoints, and you don't need to have UI around ;)

KhazAkar avatar Aug 17 '24 13:08 KhazAkar

Assuming the bindings support http endpoints with the appropriate semantics (e.g. OpenAI) we do expose custom endpoint setting that you could try.

If you get that working, I'd be happy to include some configuration notes in the docs.

notpeter avatar Aug 23 '24 16:08 notpeter

Assuming the bindings support http endpoints with the appropriate semantics (e.g. OpenAI) we do expose custom endpoint setting that could try.

If you get that working, I'd be happy to include some configuration notes in the docs.

Those bindings seem to be openAI compatible, they have example CLI server implementation in repo. I might try finding time to do so, but since I'm lacking it a little, I'm asking here 😁

KhazAkar avatar Aug 23 '24 20:08 KhazAkar

@rajivmehtaflex This is an unrelated enhancement request. Please do not hijack it for your your configuration issues. available_models is an array of objects, not an array of strings. Additionally I'm not sure whether OpenRouter supports the Ollama Rest API and I believe it only supports OpenAI API semantics. Please open a new issue.

notpeter avatar Sep 16 '24 13:09 notpeter

GPT4All implements an OpenAI compatible API interface on port 4891. It should be supported as other OpenAI API Compatible providers are.

  "language_models": {
    "openai": {
      "api_url": "http://localhost:4891/v1",
      "available_models": [
        {
          "name": "model-name",
          "display_name": "Model Name Here",
          "max_tokens": 32768
        }
      ],
      "version": "1"
    },
  }

Thanks!

notpeter avatar Jan 21 '25 20:01 notpeter