Fix: bugs when using opensource models
Description
Bugs fixes
Related Issues
#575 #528
Proposed Changes
1 - clean_up_json function is resolved to parse from first instance of '{' in llm outputs. Allows for opensource models which are more verbose.
2 - embed function resolves to decode encoded chunked tokens. This allows opensource models to with a different tokenizer to still work.
Checklist
- [x] I have tested these changes locally.
- [x] I have reviewed the code changes.
- [x] I have updated the documentation (if necessary).
- [x] I have added appropriate unit tests (if applicable).
Additional Notes
I have not tested with openai's model, only with llm: groq and llm. embedding: lmstudio.
@microsoft-github-policy-service agree
We have resolved several issues related to text encoding and JSON parsing that are rolled up into version 0.2.2. Please try again with that version and re-open if this is still an issue.
(this may not resolve embeddings formats, but our expectation is that any proxy will translate to maintain compatibility with the default GraphRAG LLM calls)