infinity MRL different dimensions not normalized

System Info

os: linux hardware: gpu version: 0.0.75

Information

[x] Docker + cli
[ ] pip + cli
[ ] pip + usage of Python interface

Tasks

[x] An officially supported CLI command
[ ] My own modifications

Reproduction

Hello, I found that for the embedding model deployed by infinity, when requesting through the OpenAI interface and passing in the dimensions parameter, although the corresponding dimension is returned, the embedding is not normalized. It seems to be directly truncated.

All returned by OpenAI have been normalized.

Feb 24 '25 09:02 gaohongkui

Is this a bug or a deliberate design? @michaelfeil

Mar 16 '25 09:03 gaohongkui

@gaohongkui Some models also contain a PyTorch layer for normalization. Given the output of the vector is already normalized, what do you want to do?

What would be the expected behavior?

Options:

re-normalize the normalized vector on unit length 1
disable normalization is likely not a available option!
how do api providers (cohere and OpenAI solve this?) both offer MRL afaik

Mar 16 '25 17:03 michaelfeil

I only tested the API of OpenAI. Their approach is to automatically truncate and return after normalizing when specifying dimensions in the request parameters.

But I think it seems better to design an additional parameter normalize: bool for users to choose.

Mar 17 '25 06:03 gaohongkui

I feel like you did not read my response at all.

Currently, the embeddings are 2x normalized. 1x by infinity and 1x by sentence-transformers. The MRL is applied afterwards, as it’s a per request parameter.

If you truncate after normalize, you have the same result. I think you mean the other way round. What do normalized MRL embeddings look like on OpenAI?

Mar 17 '25 15:03 michaelfeil

I feel like you did not read my response at all.

Currently, the embeddings are 2x normalized. 1x by infinity and 1x by sentence-transformers. The MRL is applied afterwards, as it’s a per request parameter.

If you truncate after normalize, you have the same result. I think you mean the other way round. What do normalized MRL embeddings look like on OpenAI?

I seem to understand what you mean. The current normalize process of infinity is:

input sent 
step1 -> model infer 
step2 -> embedding (may be normalized or not) 
step3 -> infinity normalize (get max dim embedding) 
step4 -> truncate (if set dim < max dim) 
step5 -> return

But the problem I reported is that: the normalized embedding obtained in step 3 will become unnormalized after being directly truncated in step 4.

Therefore, I think it would be more logical to swap step 3 and step 4 and allow users to carry normalize: bool when making requests to decide whether to perform infinity normalize.

Mar 18 '25 07:03 gaohongkui

Gotcha, note the step 3 is batch-normalized, while normalize is a per-request parameter / MRL too. Therefore it would make sense to do truncate then normalize, but it’s hard.

Mar 18 '25 14:03 michaelfeil