graphrag icon indicating copy to clipboard operation
graphrag copied to clipboard

Fix: bugs when using opensource models

Open PaulSZH95 opened this issue 1 year ago • 1 comments

Description

Bugs fixes

Related Issues

#575 #528

Proposed Changes

1 - clean_up_json function is resolved to parse from first instance of '{' in llm outputs. Allows for opensource models which are more verbose.

2 - embed function resolves to decode encoded chunked tokens. This allows opensource models to with a different tokenizer to still work.

Checklist

  • [x] I have tested these changes locally.
  • [x] I have reviewed the code changes.
  • [x] I have updated the documentation (if necessary).
  • [x] I have added appropriate unit tests (if applicable).

Additional Notes

I have not tested with openai's model, only with llm: groq and llm. embedding: lmstudio.

PaulSZH95 avatar Jul 18 '24 09:07 PaulSZH95

@microsoft-github-policy-service agree

PaulSZH95 avatar Jul 18 '24 09:07 PaulSZH95

We have resolved several issues related to text encoding and JSON parsing that are rolled up into version 0.2.2. Please try again with that version and re-open if this is still an issue.

(this may not resolve embeddings formats, but our expectation is that any proxy will translate to maintain compatibility with the default GraphRAG LLM calls)

natoverse avatar Aug 09 '24 17:08 natoverse