c0sogi
c0sogi
> @c0sogi Thanks for handling that! Do you see a chance not to repeat the batching process but unify it for sync and async? Can we work it out together?...
> @c0sogi It's a good idea to switch to the list of points instead of the batch, to make things consistent. > > I'm unsure if we should throw `NotImplementedError`...
> Async API was introduced in #7704. Awesome. Thanks for finishing this out.
```python # my_model_def.py from llama_api.schemas.models import ExllamaModel, LlamaCppModel, ReverseProxyModel gpt35 = ReverseProxyModel(model_path="https://api.openai.com") openai_replacement_models = {"gpt-3.5-turbo": "gpt35"} ``` ```python # test.py import requests url = "http://localhost:8000/v1/chat/completions" payload = { "model": "gpt-3.5-turbo",...
Yes It should be langchain-compatible. However, there's problem with dealing with None parameter in body. I've pushed changes to the `master` branch, so check it out. This should work. ```python...
Probably due to the too long response time of the 70B model, a timeout might be occurring and the swagger ui and requests module are not getting a response normally....
``` """ ------------------------------------------------------------------------------------- ''' !!! 첫 사용 시 반드시 상단 메뉴 [런타임] -> [런타임 유형 변경] -> [하드웨어 가속기]를 [GPU] 및 [T4]로 설정 * 이후 [런타임] -> [모두 실행] 클릭....
The model name will be `facebook/opt-125m` for example purposes. In `./app/models/llms.py`, find the `LLMModels` class. Then try adding this to the class members and reboot. ```python my_model = OpenAIModel( name="facebook/opt-125m",...