graphrag
graphrag copied to clipboard
Add ‘dimensions’ parameter for OpenAIEmbedding
Description
This pull request introduces an dimensions option into the GraphRAG's embedding, allowing it to take a user-defined size of dimensions which won't affect current usage of default embedding dimensions.
Related Issues
None.
Proposed Changes
Modified the GraphRAG embedding code to accept a dimensions parameter, enabling users to customize its size. Added relevant error checks to ensure the input dimensions value is within reasonable limits.
Checklist
- [√ ] I have tested these changes locally.
- [ √] I have reviewed the code changes.
- [ ] I have updated the documentation (if necessary).
- [ ] I have added appropriate unit tests (if applicable).
Additional Notes
According to https://openai.com/index/new-embedding-models-and-api-updates/, the text-embedding-3-large has a default dimensions size of 3072, which is not suitable for everyone. Noticing excessive embedding dimensions can lead to significant computational and storage overhead without yielding proportional performance improvements.
How to use
add dimensions: <your dimensions>
in setting.yaml after initialize the project.
embeddings:
async_mode: threaded # or asyncio
llm:
api_key: ${EMBEDDING_KEY}
type: azure_openai_embedding # or azure_openai_embedding
model: text-embedding-3-large
api_base: ${EMBEDDING_BASE}
api_version: "2024-02-01"
# organization: <organization_id>
deployment_name: text-embedding-3-large
dimensions: 1024
add dimensions: <your dimensions>
in OpenAIEmbedding(in graphrag_local_search.ipynb or global search )
text_embedder = OpenAIEmbedding(
api_key=api_key,
api_base=azure_endpoint,
api_version=api_version,
api_type=OpenaiApiType.AzureOpenAI,
model=embedding_model,
deployment_name=deployment_name,
max_retries=20,
dimensions = 1024
)