LLPhant icon indicating copy to clipboard operation
LLPhant copied to clipboard

Support user-defined embedding dimensions in Generators

Open bernard-ng opened this issue 9 months ago • 1 comments

https://platform.openai.com/docs/api-reference/embeddings/create#embeddings-create-dimensions

dimensions integer Optional The number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3 and later models.

It is possible to define the dimension of an embedding via the configuration of the service used, for openai via “dimensions” in the model options. Defining the dimension of embeddings can be interesting if you want to interchange different embedding generators (eg: research, benchmarking, etc...).

Currently, the size is defined statically:

final class OpenAI3SmallEmbeddingGenerator extends AbstractOpenAIEmbeddingGenerator
{
    public function getEmbeddingLength(): int
    {
        return 1536;
    }
// ...
}

I think this function should take into account customizations made by the user, or add a function to define the embedding dimension from the generator

public function getDefaultEmbeddingLength(): int;

// default unless user-defined 
public function getEmbeddingLength(): int;

// alter config to add “dimensions” option
// $dimension should be >= default embedding length 
public function setEmbeddingLength(int $dimension): void;

What do you think?

bernard-ng avatar May 04 '24 09:05 bernard-ng

Hey @bernard-ng ,

Yes I agree. Do you want to contribute on this one?

MaximeThoonsen avatar May 06 '24 07:05 MaximeThoonsen

@MaximeThoonsen can you have a look ?

bernard-ng avatar May 29 '24 00:05 bernard-ng