mem0 icon indicating copy to clipboard operation
mem0 copied to clipboard

Reduce Dimensions of new OpenAI embedding models is not working

Open ad3sai opened this issue 1 year ago • 2 comments

🐛 Describe the bug

I noticed that support for new OpenAI embedding models such as text-embedding-3-small and text-embedding-3-large are added. They have an ability to reduce the output dimensions from default ones i.e 1536. I want to reduce the embedding dimensions to 1024 but seems like vector_dimension parameter is getting setup instead of dimensions parameter in OpenAIEmbedder class. I have below Yaml file:

vectordb:
  provider: elasticsearch
  config:
    collection_name: 'collection_name'
    es_url: ['es_host']
    http_auth:
      - id
      - password
    verify_certs: false

embedder:
  provider: openai
  config:
    model: 'text-embedding-3-small'
    vector_dimension: 1024

Above is giving me error when adding data in elastic: 'error': {'type': 'mapper_parsing_exception', 'reason': 'failed to parse', 'caused_by': {'type': 'illegal_argument_exception', 'reason': 'Field [embeddings] of type [dense_vector] of doc has exceeded the number of dimensions [1024] defined in mapping'}}

ad3sai avatar Jan 29 '24 17:01 ad3sai

Hey @ad3sai, thanks for creating the issue. Based on the error message, seems like you were using the index with a different embedding size earlier. You can check if the size of embedding field in your elasticsearch index was set to 1024 or not by doing API call here: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-get-mapping.html

Solution here is to use a different collection_name in the configuration which will create a new index and use it.

deshraj avatar Jan 29 '24 17:01 deshraj

Hey @deshraj Thank you for your reply. I got it working by creating a custom class for OpenAIEmbeddingFunction from chromadb.utils.embedding_function. Seems like dimensions parameter is not being added while creating the embeddings and due to which it ends up creating embeddings with default dimensions. Adding dimensions here solved it: https://github.com/chroma-core/chroma/blob/a370684dd032eaf52ad9619c4811449a52cc1e2c/chromadb/utils/embedding_functions.py#L203

ad3sai avatar Jan 29 '24 18:01 ad3sai