lorax issues

Embedder Service v0 with FlashBert

1

Error: Warmup(Generation("'bool' object has no attribute 'dtype'"))

3

### System Info lorax latest docker, 2 A100, unbuntu 22.04 ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [ ] An officially supported command...

KrisWongz

bug

Support loading `.pt` weights

2

### Feature request Need support for loading models that only contain `.pt` weights ### Motivation I quantized Mixtral 8x7b model using HQQ (which produces a `qmodel.pt` file). But I am...

shripadk

bug

feat: support loading eetq quantized model

2

# What does this PR do? Fixes #391 ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if...

thincal

Async client to backoff when model overloaded

1

### Feature request Have the (async) client automatically backoff sending requests when the deployment is overloaded. ### Motivation When the async client exceeds the deployment queue capacity / rate limits,...

jppgks

enhancement

Misleading/wrong openapi schema in REST API docs for structured output

### System Info Running image **ghcr.io/predibase/lorax:0.9.0** in kubernetes. ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [X] An officially supported command - [ ]...

oscarjohansson94

documentation

Need some help. " You need to decrease --max-batch-prefill-tokens."

1

### System Info latest ### Information - [ ] Docker - [ ] The CLI directly ### Tasks - [ ] An officially supported command - [ ] My own...

KrisWongz

question

Upgrade to AWQ kernels v0.0.6

2

### Feature request Upgrade to latest AWQ kernels for better performance and less memory consumption. https://github.com/casper-hansen/AutoAWQ/pull/365 ### Motivation It seems that latest AWQ and Kernels changed a lot, including many...

thincal

enhancement

LoRAX server with 2 GPUs and multiple adapters becomes permanently faster in swapping ONLY after parallel execution of requests.

3

### System Info So I just noticed this very strange behaviour that has perhaps no severe implications but nevertheless is interesting to explore: I am hosting a Mixtral model on...

lighteternal

question

log arbitrary headers

1

### Feature request We should be able to log arbitrary headers to STDOUT for debugging / etc. We should be setting the names of the headers in the lorax startup...

noah-yoshida

good first issue

lorax
lorax copied to clipboard

Metadata

Embedder Service v0 with FlashBert

Error: Warmup(Generation("'bool' object has no attribute 'dtype'"))

Support loading `.pt` weights

feat: support loading eetq quantized model

Async client to backoff when model overloaded

Misleading/wrong openapi schema in REST API docs for structured output

Need some help. " You need to decrease --max-batch-prefill-tokens."

Upgrade to AWQ kernels v0.0.6

LoRAX server with 2 GPUs and multiple adapters becomes permanently faster in swapping ONLY after parallel execution of requests.

log arbitrary headers

← Metadata

Owner

Metadata

lorax lorax copied to clipboard

Metadata

← Metadata

Owner

Metadata

lorax
lorax copied to clipboard