Reduce Dimensions of new OpenAI embedding models is not working
🐛 Describe the bug
I noticed that support for new OpenAI embedding models such as text-embedding-3-small and text-embedding-3-large are added. They have an ability to reduce the output dimensions from default ones i.e 1536. I want to reduce the embedding dimensions to 1024 but seems like vector_dimension parameter is getting setup instead of dimensions parameter in OpenAIEmbedder class. I have below Yaml file:
vectordb:
provider: elasticsearch
config:
collection_name: 'collection_name'
es_url: ['es_host']
http_auth:
- id
- password
verify_certs: false
embedder:
provider: openai
config:
model: 'text-embedding-3-small'
vector_dimension: 1024
Above is giving me error when adding data in elastic: 'error': {'type': 'mapper_parsing_exception', 'reason': 'failed to parse', 'caused_by': {'type': 'illegal_argument_exception', 'reason': 'Field [embeddings] of type [dense_vector] of doc has exceeded the number of dimensions [1024] defined in mapping'}}
Hey @ad3sai, thanks for creating the issue. Based on the error message, seems like you were using the index with a different embedding size earlier. You can check if the size of embedding field in your elasticsearch index was set to 1024 or not by doing API call here: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-get-mapping.html
Solution here is to use a different collection_name in the configuration which will create a new index and use it.
Hey @deshraj Thank you for your reply. I got it working by creating a custom class for OpenAIEmbeddingFunction from chromadb.utils.embedding_function. Seems like dimensions parameter is not being added while creating the embeddings and due to which it ends up creating embeddings with default dimensions. Adding dimensions here solved it: https://github.com/chroma-core/chroma/blob/a370684dd032eaf52ad9619c4811449a52cc1e2c/chromadb/utils/embedding_functions.py#L203