optillm
optillm copied to clipboard
Optimizing inference proxy for LLMs
Hello, I'm doing minion now https://github.com/femto/minion, a quick demo here: https://www.youtube.com/watch?v=-LW7TCMUfLs&t=33s, basically I want to do is it can handle arbitrary type of queries, math, coding, qa, long novel writing,...
**Description:** Using the MOA approach in the Ollama API via an OpenAI-compatible endpoint results in a `list index out of range` error. The request fails to return a valid response....
### Symptoms I used a llama-server with OPENAI_API_KEY='no_key', but it doesn't work: optillm.py was accessing the OpenAI server, not the llama-server. ``` 2024-10-13 21:03:14,128 - INFO - HTTP Request: POST...
https://lightning.ai/studios
Initially brought up in #8 having a GUI would make it easier to visualize and compare different approaches.
Hi there, This pull request shares a security update on optillm. We also have an entry for optillm in our directory, MseeP.ai, where we provide regular security and trust updates...
Add documentation to show how to use optillm with local inference server for getting logits. This is a commonly requested feature in ollama https://github.com/ollama/ollama/issues/2415 that is already supported in optillm...
Support the /completions endpoint for inbuilt inference server. _Originally posted by @SeriousJ55 in https://github.com/codelion/optillm/discussions/168#discussioncomment-12403092_
Hi 👋🏻 Thanks for your work on OptiLLM! I've worked on integrating it to [Harbor](https://github.com/av/harbor) and come across a couple of nice-to-haves that might make project friendlier under specific conditions....