lorax
lorax copied to clipboard
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
### System Info - ghcr.io/predibase/lorax:a8ca5cb - Ubuntu 20.04 - GPU A10G ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [X] An officially supported command...
### System Info I am using docker image and it is 2 days old ghcr.io/predibase/lorax:main This is my host nvidia info : +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA...
# What does this PR do? Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks...
### System Info We are using lorax 0.12.1 and we notice that sometimes different requests of different adapters could affect each other. We noticed that sometimes one request's input could...
Bumps [torch](https://github.com/pytorch/pytorch) from 2.4.0 to 2.6.0. Release notes Sourced from torch's releases. PyTorch 2.6.0 Release Highlights Tracked Regressions Backwards Incompatible Change Deprecations New Features Improvements Bug fixes Performance Documentation Developers...
### System Info spec - aws g6e.12xLarge Hi, I'm trying out lorax. I ran a docker container with image tag as main(ghcr.io/predibase/lorax:main) and was facing some kernel issues. Attaching logs....
### System Info When I use return_k_alternatives with generation config ``` generation_config = { "max_new_tokens": 64, "temperature": 1, "do_sample": True, "top_p": 0.8, "top_k": 10, "return_k_alternatives": 5 } ``` It cause...
### System Info **Lorax version:** Name: lorax-client Version: 0.6.3 Summary: LoRAX Python Client Home-page: https://github.com/predibase/lorax Author: Travis Addair Author-email: [email protected] License: Apache-2.0 Location: /mnt/share/ai_studio/.venv/lib/python3.11/site-packages Requires: aiohttp, certifi, huggingface-hub, pydantic Required-by:...
OpenAI has already deprecated max_tokens for [max_completion_tokens](https://platform.openai.com/docs/api-reference/chat/create#chat-create-max_completion_tokens)