langchain icon indicating copy to clipboard operation
langchain copied to clipboard

fix pydantic parser: JSONDecodeError

Open Lin-jun-xiang opened this issue 1 year ago โ€ข 3 comments

Problem

The double quatations (") which cannot use parser.parse() correctly. For example, completion from chain response as following:

text = """
Answer: {
  "product_name": "Control4ยฎ T3 Series 7\" Tabletop Touch Screen",
  "manufactured_date": "2015-10-09",
  "size_inch": "7.0\" ร— 5.1\" ร— 0.75\"",
  "resolution": "1280 ร— 800",
  "contrast": "",
  "operation_temperature": "32 ~ 104หšF (0หš ~ 40หšC)",
  "power_supply": "PoE (IEEE802.3af) 100VAC ~ 240VAC, 50-60 Hz International power supply adapters included",
  "sunlight_readable": false,
  "antiglare": false,
  "low_power_consumption": false,
  "high_brightness": false,
  "wide_temperature": false,
  "fast_response": true,
  "screen_features": ["Capacitive touch", "Camera: 720p", "HD camera", "Speakers and microphone", "Video intercom integration", "Audio intercom integration"]
}
"""

# Just take the parse function :)
try:
    # Greedy search for 1st json candidate.
    match = re.search(
        r"\{.*\}", text.strip(), re.MULTILINE | re.IGNORECASE | re.DOTALL
    )
    json_str = ""
    if match:
        json_str = match.group()
    json_object = json.loads(json_str, strict=False)

except Exception as e:
    print(e)

Wrong Result

Expecting ',' delimiter: line 2 column 43 (char 44)

Solution

Let's add a remove_illegal_quatations function which will:

  1. Remove the illegal quatations which not display in first, second, third, last time
  2. Remeber there will List value, we should'nt remove here. ex: ["Capacitive touch", "Camera: 720p", ..."]
  3. Remeber there will true/false which will not have "". ex: "sunlight_readable": false

Lin-jun-xiang avatar Jul 06 '23 13:07 Lin-jun-xiang

The latest updates on your projects. Learn more about Vercel for Git โ†—๏ธŽ

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain โฌœ๏ธ Ignored (Inspect) Jul 21, 2023 2:52am

vercel[bot] avatar Jul 06 '23 13:07 vercel[bot]

@CodiumAI-Agent please review

coditamar avatar Jul 07 '23 03:07 coditamar

PR Analysis

  • ๐ŸŽฏ Main theme: Fixing the Pydantic parser to handle illegal quotations in JSON strings
  • ๐Ÿ” Description and title: Yes
  • ๐Ÿ“Œ Type of PR: Bug fix
  • ๐Ÿงช Relevant tests added: Yes
  • โœจ Minimal and focused: Yes, the PR is focused on fixing a specific issue with the Pydantic parser and all changes are relevant to this issue.
  • ๐Ÿ”’ Security concerns: No, the PR does not introduce any obvious security concerns.

PR Feedback

  • ๐Ÿ’ก General PR suggestions: The PR is well-structured and the changes are relevant to the issue at hand. The new function _remove_illegal_quatations is a good approach to handle the issue. However, the function could be improved by handling edge cases and making the code more readable.

  • ๐Ÿค– Code suggestions:

    • relevant file: langchain/output_parsers/pydantic.py suggestion content: Consider adding a docstring to the _remove_illegal_quatations function to explain its purpose and how it works. [medium]

    • relevant file: langchain/output_parsers/pydantic.py suggestion content: The _remove_illegal_quatations function assumes that there are always at least four quotation marks in a line. This might not always be the case. Consider adding a check to handle lines with less than four quotation marks. [important]

    • relevant file: langchain/output_parsers/pydantic.py suggestion content: The _remove_illegal_quatations function could be made more readable by using named constants for the indices instead of numbers. For example, FIRST_QUOTE_INDEX = 0, SECOND_QUOTE_INDEX = 1, etc. [medium]

    • relevant file: tests/unit_tests/output_parsers/test_pydantic_parser.py suggestion content: Consider adding more test cases to cover different scenarios and edge cases. For example, a test case where a line has less than four quotation marks. [important]

How to use

Tag me in a comment '@CodiumAI-Agent' to ask for a new review after you update the PR. You can also tag me and ask any question, for example '@CodiumAI-Agent is the PR ready for merge?'

CodiumAI-Agent avatar Jul 07 '23 03:07 CodiumAI-Agent