#1155: Add support for OpenAI-compatible endpoint in LLM and Embed
Description
Delivers using the OpenAI-specific term base_url for LLM and api_base for Embed. Tested on LocalAI.io for LLM and Embed. Original issued asked for endpoint support, but base_url seemed superior to match OpenAI kwargs.
Fixes #1155
Type of change
- [x] New feature (non-breaking change which adds functionality)
How Has This Been Tested?
- Tested locally against LocalAI.io with the following config:
os.environ["OPENAI_API_KEY"] = "sk-xxxx"
app = App.from_config(config={
"app": {
"config": {
"id": "test"
}
},
"llm": {
"provider": "openai",
"config": {
"model": 'gpt-4',
"temperature": 0.1,
"max_tokens": 1000,
"top_p": 1,
"stream": False,
"base_url": "http://localhost:8180/v1"
},
},
"embedder": {
"config": {
"api_base": "http://localhost:8180/v1"
}
}
})
app.add("https://www.forbes.com/profile/elon-musk")
app.add("https://en.wikipedia.org/wiki/Elon_Musk")
out = app.query("What is the net worth of Elon Musk today?")
# Answer: The net worth of Elon Musk today is $258.7 billion.
print(out)
It appears perfectly functional from LocalAI logs. The RAG operation itself returned "I dont know Elon's net worth", but all the RAG search results and LLM templating looked right in the debug. I personally plan to use EmbedChain with this change for my RAG ingest and search but to use my own template for synthesizing search results since running it against a 7b model looks like it will need a little more tinkering to behave as expected than what default prompt provides (maybe there is an obvious way to change the prompt I am missing??). I did not run any unit tests.
EDIT: I found the prompt override nvm: https://github.com/embedchain/embedchain/blob/9afc6878c82ee71332fa09aebecea93dd7829e7f/configs/full-stack.yaml#L18C5-L28C102
Checklist:
- [x] My code follows the style guidelines of this project
- [x] I have performed a self-review of my own code
- [x] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass locally with my changes
- [x] Any dependent changes have been merged and published in downstream modules
- [x] I have checked my code and corrected any misspellings
Maintainer Checklist
- [ ] closes XXX (Replace xxxx with the GitHub issue number)
- [ ] Made sure Checks passed
LocalAI configs if you wanted to attempt utilization yourself:
gpt-4.yaml
name: gpt-4
mmap: true
parameters:
# model: huggingface://TheBloke/Mistral-7B-OpenOrca-GGUF/mistral-7b-openorca.Q6_K.gguf
model: openhermes-2.5-mistral-7b.Q6_K.gguf
temperature: 0.2
top_k: 40
top_p: 0.95
template:
chat_message: |
<|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "user"}}user{{end}}
{{if .Content}}{{.Content}}{{end}}
<|im_end|>
chat: |
{{.Input}}
<|im_start|>assistant
completion: |
{{.Input}}
context_size: 4096
gpu_layers: 55
f16: true
stopwords:
- <|im_end|>
usage: |
curl http://localhost:8180/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}]
}'
text-embedding-ada-002.yaml
name: text-embedding-ada-002
backend: sentencetransformers
embeddings: true
parameters:
model: all-MiniLM-L6-v2
Can we merge this PRs? It seems only add more config and change only the OpenAI Chat and Embeding to support the local endpoints.
Failing test is easy fix. Should not require the config or env for base var. Default should be None. Might get around to this this week. One line change imo
yes, Merging this PR for now. Will fix the tests in my follow up PR.