partial-json-parser-js icon indicating copy to clipboard operation
partial-json-parser-js copied to clipboard

🚀 Feature Request: Extract JSON from noisy or wrapped AI model output

Open ErfanBahramali opened this issue 5 months ago • 1 comments

Hi First, thank you for this great library — it’s very helpful.

I’d like to suggest an enhancement based on common challenges when working with AI models, especially LLMs that aren’t always very consistent.


✏️ Problem

When using LLMs to generate JSON, the output often includes extra text before or after the actual JSON.
For example:

Hello! Here is your result:
{"key":"value"}

Or sometimes the output is formatted as code blocks:

```json
[{ "key": "value" }]```

In such cases, calling parse directly fails, since the string isn’t valid JSON.


💡 Proposed solution

Add an option (like other enums) that allows the parser to:

  • Detect and extract the first JSON snippet from the text, ignoring non‑JSON prefixes or suffixes.
  • Optionally specify whether to search for an object ({...}) or an array ([...]) depending on what the user expects.
  • Optionally fix slightly malformed JSON, e.g. removing extra characters between braces or brackets.

Example utility function to extract JSON from text:

export function extractJsonFromText(text: string): object {
  const matches = text.match(/[{\[]{1}([,:{}\[\]0-9.\-+Eaeflnr-u \n\r\t]|".*?")+[}\]]{1}/gis);
  return matches.map((m) => JSON.parse(m)).flat();
}

✅ Example usage

import { parse } from "partial-json";

const result = parse('```{"key":"value"}```');
// Ideally: result => { key: 'value' }

Looking forward to your feedback! 🌱

ErfanBahramali avatar Jul 13 '25 13:07 ErfanBahramali

This is a good question! There were people requesting a similar feature too:

  • #10

I recommend you making a separate tool that extract json using the regex you provided in text because:

  1. The json detecting / extracting process is outside the main logic of partial json parsing. It acts as a pre-processing step, so it is achievable
  2. I prefer packages to be atomic, which means every package only do minimal things to implement the whole functionality. partial-json means incomplete json, rather than malformed json / potential json inside text. partial-json-extractor would be a suitable name for your case
  3. Modern LLMs do produce JSON well. In my personal practices, I seldom see results like missing \n before the ending fence ```. So the json detection regex is much simpler like this
  4. Not every valid JSON substring are what people want. Strings like [1] may occur in normal responses. Detecting valid JSON too eagerly may be too aggressive for common use cases. For example, when an LLM outputs [, it is likely to be starting [DONE] rather than a JSON list.

CNSeniorious000 avatar Jul 14 '25 07:07 CNSeniorious000