Lukas Kreussel issues

Results 21 issues of


                                            Lukas Kreussel

Docker Images exit with exitcode 132 on AMD systems

The `llama.cpp:light` docker image exits with exitcode 132 when loading the model on both my AMD based systems. Hinting at a missing cpu instruction. If i try to run the...

Problems when i try to use this inside the default python 3.10 docker container

When i try to install and use this package via a requirements file in the default 3.10 python container i get the following error when i try to import the...

Slow setup times on windows

Setting up cuda-toolkit on a normal windows github actions runner can take >>15 min. I tried to speed this up by using the network installer and specify only the sub-packages...

Are quantized models supported yet?

### Feature request Since i spotted [bert_quant.rs](https://github.com/huggingface/text-embeddings-inference/blob/main/backends/candle/src/models/bert_quant.rs) in the candle backend i was curious if it is currently possible to point the embedding server to a "*.gguf" file and load...

Fix docker images

Fixes #247 Since we now depend on `pyo3` in `core` we need to include `libpython` in our runtime container. Maybe we could put this `pyo3` dependency behind a feature flag...

Server crashes while processing 2 concurrent requests

**Describe the bug** If two requests are sent to the server at roughly the same time, it will start to respond to both requests and then crash with the following...

bug

Free tensors from RAM if they are offloaded to an Accelerator

Right now the data of a tensor isn't freed if it is offloaded to a GPU. We should fix that to enable users to run bigger models which are split...

issue:enhancement

topic:cublas

topic:clblast

topic:metal

Better generation stats

I'm currently facing an issue where the generation on a gpu sometimes slows down and its very hard to determine why. (see https://github.com/rustformers/llm/pull/325) It would be great if we could...

meta:help-wanted

meta:maintenance

topic:api-design

Better embedding extraction

As pointed out in https://github.com/rustformers/llm/pull/291, the quality of embeddings produced by the models at present appears to be suboptimal. Our current approach uses the embedding of the final token as...

issue:enhancement

Configurable gitlab `host`

**Please describe the feature you want** It would be a nice addition if the GitLab host were configurable, to easily point the Tabby server to a self-hosted GitLab instance. **Additional...

enhancement