graphrag
graphrag copied to clipboard
Perf optimizations in map_query_to_entities()
Description
Perf optimizations in map_query_to_entities().
Result of a perf benchmark using PyTest on a list with 1M entities (corresponding roughly to calling map_query_to_entities() with 50K entities and the defaults k = 10 and oversample_scaler = 2):
- Original
get_entity_by_key(): 2.15s - Optimized
get_entity_by_key(): 0.10s - Lookup
get_entity_by_id(): 0.00s (below PyTest's duration truncation)
Related Issues
#1275
Proposed Changes
- In the default case where
embedding_vectorstore_key == EntityVectorStoreKey.ID, use the fact that entities are already stored in a dictionary to perform an O(1) lookup instead of an O(N) scan. The lookup is implemented in a new method calledget_entity_by_id(). - In the general case, optimize
get_entity_by_key()by movingisinstance(),is_valid_uuid(),replace()out of the loop and callinggetattr()once instead of twice.
Checklist
- [x] I have tested these changes locally.
- [x] I have reviewed the code changes.
- [ ] I have updated the documentation (if necessary).
- [x] I have added appropriate unit tests (if applicable).
@microsoft-github-policy-service agree company="Microsoft"