MRL different dimensions not normalized
System Info
os: linux hardware: gpu version: 0.0.75
Information
- [x] Docker + cli
- [ ] pip + cli
- [ ] pip + usage of Python interface
Tasks
- [x] An officially supported CLI command
- [ ] My own modifications
Reproduction
Hello, I found that for the embedding model deployed by infinity, when requesting through the OpenAI interface and passing in the dimensions parameter, although the corresponding dimension is returned, the embedding is not normalized. It seems to be directly truncated.
All returned by OpenAI have been normalized.
Is this a bug or a deliberate design? @michaelfeil
@gaohongkui Some models also contain a PyTorch layer for normalization. Given the output of the vector is already normalized, what do you want to do?
What would be the expected behavior?
Options:
- re-normalize the normalized vector on unit length 1
- disable normalization is likely not a available option!
- how do api providers (cohere and OpenAI solve this?) both offer MRL afaik
I only tested the API of OpenAI. Their approach is to automatically truncate and return after normalizing when specifying dimensions in the request parameters.
But I think it seems better to design an additional parameter normalize: bool for users to choose.
I feel like you did not read my response at all.
Currently, the embeddings are 2x normalized. 1x by infinity and 1x by sentence-transformers. The MRL is applied afterwards, as it’s a per request parameter.
If you truncate after normalize, you have the same result. I think you mean the other way round. What do normalized MRL embeddings look like on OpenAI?
I feel like you did not read my response at all.
Currently, the embeddings are 2x normalized. 1x by infinity and 1x by sentence-transformers. The MRL is applied afterwards, as it’s a per request parameter.
If you truncate after normalize, you have the same result. I think you mean the other way round. What do normalized MRL embeddings look like on OpenAI?
I seem to understand what you mean. The current normalize process of infinity is:
input sent
step1 -> model infer
step2 -> embedding (may be normalized or not)
step3 -> infinity normalize (get max dim embedding)
step4 -> truncate (if set dim < max dim)
step5 -> return
But the problem I reported is that: the normalized embedding obtained in step 3 will become unnormalized after being directly truncated in step 4.
Therefore, I think it would be more logical to swap step 3 and step 4 and allow users to carry normalize: bool when making requests to decide whether to perform infinity normalize.
Gotcha, note the step 3 is batch-normalized, while normalize is a per-request parameter / MRL too. Therefore it would make sense to do truncate then normalize, but it’s hard.