graphrag icon indicating copy to clipboard operation
graphrag copied to clipboard

Add ‘dimensions’ parameter for OpenAIEmbedding

Open SliverBulle opened this issue 5 months ago • 4 comments

Description

This pull request introduces an dimensions option into the GraphRAG's embedding, allowing it to take a user-defined size of dimensions which won't affect current usage of default embedding dimensions.

Related Issues

None.

Proposed Changes

Modified the GraphRAG embedding code to accept a dimensions parameter, enabling users to customize its size. Added relevant error checks to ensure the input dimensions value is within reasonable limits.

Checklist

  • [√ ] I have tested these changes locally.
  • [ √] I have reviewed the code changes.
  • [ ] I have updated the documentation (if necessary).
  • [ ] I have added appropriate unit tests (if applicable).

Additional Notes

According to https://openai.com/index/new-embedding-models-and-api-updates/, the text-embedding-3-large has a default dimensions size of 3072, which is not suitable for everyone. Noticing excessive embedding dimensions can lead to significant computational and storage overhead without yielding proportional performance improvements.

How to use

add dimensions: <your dimensions> in setting.yaml after initialize the project.

embeddings:
  async_mode: threaded # or asyncio
  llm:
    api_key: ${EMBEDDING_KEY}
    type: azure_openai_embedding # or azure_openai_embedding
    model: text-embedding-3-large
    api_base: ${EMBEDDING_BASE}
    api_version: "2024-02-01"
    # organization: <organization_id>
    deployment_name: text-embedding-3-large
    dimensions: 1024

add dimensions: <your dimensions> in OpenAIEmbedding(in graphrag_local_search.ipynb or global search )

text_embedder = OpenAIEmbedding(
    api_key=api_key,
    api_base=azure_endpoint,
    api_version=api_version,
    api_type=OpenaiApiType.AzureOpenAI,
    model=embedding_model,
    deployment_name=deployment_name,
    max_retries=20,
    dimensions = 1024
)

SliverBulle avatar Aug 28 '24 09:08 SliverBulle