infinity issues

MRL different dimensions not normalized

6

### System Info os: linux hardware: gpu version: 0.0.75 ### Information - [x] Docker + cli - [ ] pip + cli - [ ] pip + usage of Python...

gaohongkui

Is ColBERT model provide the right result for query and document vector embedding and rerank?

1

### System Info docker docker.io/michaelf34/infinity:0.0.75 not set gpu Model: ```jinaai/jina-colbert-v2``` ### Information - [ ] Docker + cli - [ ] pip + cli - [ ] pip + usage...

irelance

Slim down michaelf34/infinity:latest-amd

2

### Feature request The infinity Dockerfile is carefully constructed to not drag its build artifacts into the final image, but unfortunately the base image `rocm/pytorch:rocm6.2.3_ubuntu22.04_py3.10_pytorch_release_2.3.0` is not, and it is...

bjj

python engine failed with BGE-VL-base and optimum

### System Info INFO 2025-03-13 13:02:38,529 datasets INFO: PyTorch version 2.4.1 available. config.py:54 Usage: infinity_emb v2 [OPTIONS] Infinity API ♾️ cli v2. MIT License. Copyright (c) 2023-now Michael Feil Multiple...

WeixuanXiong

Support late chunking

1

### Feature request It would be nice to have late chunking supported by the library, optionally activated by a parameter passed in the request. This feature is available, for example,...

luonist

When the inference process encounters an out-of-memory (OOM) error, can the service automatically recover？

2

### Feature request infinity version: 0.0.75 I noticed that when a GPU OOM occurs, the service hangs and new requests cannot be executed. Could you provide a mechanism for the...

xjpang

mixedbread-ai/mxbai-embed-large-v1 model deployment problem

2

### System Info - Infinity latest CPU Image. ### Information - [x] Docker + cli - [ ] pip + cli - [ ] pip + usage of Python interface...

hungsvdut2k2

Can we ddd a timeout to the requests in the request queue?

### Feature request As an embedding service, in scenarios with high QPS (queries per second) and sensitivity to latency, if there are multiple requests piled up in Infinity's request queue...

xjpang

OpenAI API compatibility

2

Currently the endpoint for embedding models is /embeddings on the infinity server, but I believe the correct open AI format would be /v1/embeddings ? correct me if Im wrong.

harishd1998

Disable model warmup

### System Info I am using 0.0.75 and when I use `--no-model-warmup` it still tries to warm up the model. Therefor CTRL+C for exiting also does not work when warming...

isikemre

infinity
infinity copied to clipboard

Metadata

MRL different dimensions not normalized

Is ColBERT model provide the right result for query and document vector embedding and rerank?

Slim down michaelf34/infinity:latest-amd

python engine failed with BGE-VL-base and optimum

Support late chunking

When the inference process encounters an out-of-memory (OOM) error, can the service automatically recover？

mixedbread-ai/mxbai-embed-large-v1 model deployment problem

Can we ddd a timeout to the requests in the request queue?

OpenAI API compatibility

Disable model warmup

← Metadata

Owner

Metadata

infinity infinity copied to clipboard

Metadata

← Metadata

Owner

Metadata

infinity
infinity copied to clipboard