Add ‘dimensions’ parameter for OpenAIEmbedding
Description
This pull request introduces an dimensions option into the GraphRAG's embedding, allowing it to take a user-defined size of dimensions which won't affect current usage of default embedding dimensions.
Related Issues
None.
Proposed Changes
Modified the GraphRAG embedding code to accept a dimensions parameter, enabling users to customize its size. Added relevant error checks to ensure the input dimensions value is within reasonable limits.
Checklist
- [√ ] I have tested these changes locally.
- [ √] I have reviewed the code changes.
- [ ] I have updated the documentation (if necessary).
- [ ] I have added appropriate unit tests (if applicable).
Additional Notes
According to https://openai.com/index/new-embedding-models-and-api-updates/, the text-embedding-3-large has a default dimensions size of 3072, which is not suitable for everyone. Noticing excessive embedding dimensions can lead to significant computational and storage overhead without yielding proportional performance improvements.
How to use
add dimensions: <your dimensions> in setting.yaml after initialize the project.
embeddings:
async_mode: threaded # or asyncio
llm:
api_key: ${EMBEDDING_KEY}
type: azure_openai_embedding # or azure_openai_embedding
model: text-embedding-3-large
api_base: ${EMBEDDING_BASE}
api_version: "2024-02-01"
# organization: <organization_id>
deployment_name: text-embedding-3-large
dimensions: 1024
add dimensions: <your dimensions> in OpenAIEmbedding(in graphrag_local_search.ipynb or global search )
text_embedder = OpenAIEmbedding(
api_key=api_key,
api_base=azure_endpoint,
api_version=api_version,
api_type=OpenaiApiType.AzureOpenAI,
model=embedding_model,
deployment_name=deployment_name,
max_retries=20,
dimensions = 1024
)
Moreover, what bothers me is that after setting the dimensions once in setting.yaml (or not setting it at all, thus defaulting to dimensions=None, which would not be invoked in OpenAI's client.embeddings.create), such dimensions (either None or a certain number) will persist throughout the same project even if the setting.yaml changed. So I have to make a new dir , run python -m graphrag.index --init --root ./ragtest and set up setting.yaml every time I want to set up a new project.
Hi!
Is this a revision of #1020 ?
Hi!
Is this a revision of #1020 ?
Yes! Sorry I forgot to close it #1020 .
Hi @AlonsoGuevara , I'm following the progress of this PR, and the dimensions parameter for embeddings is a useful enhancement. I would like to have this feature in GraphRAG . Please let me know if there's anything I can do to assist in moving this forward.