graphrag icon indicating copy to clipboard operation
graphrag copied to clipboard

Add ‘dimensions’ parameter for OpenAIEmbedding

Open SliverBulle opened this issue 1 year ago • 4 comments

Description

This pull request introduces an dimensions option into the GraphRAG's embedding, allowing it to take a user-defined size of dimensions which won't affect current usage of default embedding dimensions.

Related Issues

None.

Proposed Changes

Modified the GraphRAG embedding code to accept a dimensions parameter, enabling users to customize its size. Added relevant error checks to ensure the input dimensions value is within reasonable limits.

Checklist

  • [√ ] I have tested these changes locally.
  • [ √] I have reviewed the code changes.
  • [ ] I have updated the documentation (if necessary).
  • [ ] I have added appropriate unit tests (if applicable).

Additional Notes

According to https://openai.com/index/new-embedding-models-and-api-updates/, the text-embedding-3-large has a default dimensions size of 3072, which is not suitable for everyone. Noticing excessive embedding dimensions can lead to significant computational and storage overhead without yielding proportional performance improvements.

How to use

add dimensions: <your dimensions> in setting.yaml after initialize the project.

embeddings:
  async_mode: threaded # or asyncio
  llm:
    api_key: ${EMBEDDING_KEY}
    type: azure_openai_embedding # or azure_openai_embedding
    model: text-embedding-3-large
    api_base: ${EMBEDDING_BASE}
    api_version: "2024-02-01"
    # organization: <organization_id>
    deployment_name: text-embedding-3-large
    dimensions: 1024

add dimensions: <your dimensions> in OpenAIEmbedding(in graphrag_local_search.ipynb or global search )

text_embedder = OpenAIEmbedding(
    api_key=api_key,
    api_base=azure_endpoint,
    api_version=api_version,
    api_type=OpenaiApiType.AzureOpenAI,
    model=embedding_model,
    deployment_name=deployment_name,
    max_retries=20,
    dimensions = 1024
)

SliverBulle avatar Aug 28 '24 09:08 SliverBulle

Moreover, what bothers me is that after setting the dimensions once in setting.yaml (or not setting it at all, thus defaulting to dimensions=None, which would not be invoked in OpenAI's client.embeddings.create), such dimensions (either None or a certain number) will persist throughout the same project even if the setting.yaml changed. So I have to make a new dir , run python -m graphrag.index --init --root ./ragtest and set up setting.yaml every time I want to set up a new project.

SliverBulle avatar Aug 28 '24 09:08 SliverBulle

Hi!

Is this a revision of #1020 ?

AlonsoGuevara avatar Aug 28 '24 22:08 AlonsoGuevara

Hi!

Is this a revision of #1020 ?

Yes! Sorry I forgot to close it #1020 .

SliverBulle avatar Aug 29 '24 01:08 SliverBulle

Hi @AlonsoGuevara , I'm following the progress of this PR, and the dimensions parameter for embeddings is a useful enhancement. I would like to have this feature in GraphRAG . Please let me know if there's anything I can do to assist in moving this forward.

ZohebAbai avatar Oct 01 '24 08:10 ZohebAbai