mem0 icon indicating copy to clipboard operation
mem0 copied to clipboard

Fix Issue #1885: Extract JSON from Code Blocks and Handle Malformed JSON with json_repair

Open clarkandrew opened this issue 1 year ago • 2 comments

Description

This PR improves the robustness of JSON handling in memory/main.py by introducing two key enhancements:

  1. Fixes #1885 with Regular Expression Parsing: Extracts JSON from LLM responses wrapped in markdown code blocks (e.g., ```json ... ```). This addresses the issue where models often return JSON in such formats, ensuring accurate parsing and processing.

  2. Integrates json_repair: Replaces json.loads with json_repair.loads to fix minor JSON formatting errors (e.g., missing parentheses, commas, or added words). This prevents reprocessing entire LLM requests due to small JSON issues.

The json-repair dependency has been added to pyproject.toml to support these improvements. These changes enhance the system’s stability and data integrity when handling dynamic JSON data.

Why json_repair?

Some LLMs, even with structured output, occasionally produce JSON that isn't fully valid. Common mistakes include missing quotes, misplaced commas, or malformed arrays and objects. Although these errors are typically minor, they can break JSON parsing and force unnecessary retries of entire requests.

I initially searched for a lightweight Python package that could fix such issues reliably but couldn't find one. So I developed json_repair, which addresses:

  • Syntax errors: Fixes missing quotes, misplaced commas, unescaped characters, and other typical JSON mistakes.
  • Malformed arrays/objects: Repairs incomplete arrays or objects by adding necessary elements to ensure structural integrity.

Incorporating this into mem0 improves the system's ability to handle edge cases where malformed JSON would otherwise lead to failed requests and retries.

Type of change

  • [x] Bug fix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds functionality)

How Has This Been Tested?

  • [x] Unit Test

Checklist:

  • [x] My code follows the style guidelines of this project
  • [x] I have performed a self-review of my own code
  • [x] I have commented my code, particularly in hard-to-understand areas
  • [x] I have made corresponding changes to the documentation
  • [x] My changes generate no new warnings
  • [x] I have added tests that prove my fix is effective or that my feature works
  • [x] New and existing unit tests pass locally with my changes
  • [x] Any dependent changes have been merged and published in downstream modules
  • [x] I have checked my code and corrected any misspellings

clarkandrew avatar Oct 22 '24 04:10 clarkandrew

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Drew seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Oct 22 '24 04:10 CLAassistant

Hey @clarkandrew Can you please sign the CLA and also resolve the merge conflicts?

Dev-Khant avatar Jan 09 '25 11:01 Dev-Khant

Closing this PR as it's stale and already fixed.

parshvadaftari avatar Sep 11 '25 19:09 parshvadaftari