graphrag Fix: bugs when using opensource models

Description

Bugs fixes

Related Issues

#575 #528

Proposed Changes

1 - clean_up_json function is resolved to parse from first instance of '{' in llm outputs. Allows for opensource models which are more verbose.

2 - embed function resolves to decode encoded chunked tokens. This allows opensource models to with a different tokenizer to still work.

Checklist

[x] I have tested these changes locally.
[x] I have reviewed the code changes.
[x] I have updated the documentation (if necessary).
[x] I have added appropriate unit tests (if applicable).

Additional Notes

I have not tested with openai's model, only with llm: groq and llm. embedding: lmstudio.

Jul 18 '24 09:07 PaulSZH95

@microsoft-github-policy-service agree

Jul 18 '24 09:07 PaulSZH95

We have resolved several issues related to text encoding and JSON parsing that are rolled up into version 0.2.2. Please try again with that version and re-open if this is still an issue.

(this may not resolve embeddings formats, but our expectation is that any proxy will translate to maintain compatibility with the default GraphRAG LLM calls)

Aug 09 '24 17:08 natoverse