🚀 Feature Request: Extract JSON from noisy or wrapped AI model output

Open ErfanBahramali opened this issue 5 months ago • 1 comments

Hi First, thank you for this great library — it’s very helpful.

I’d like to suggest an enhancement based on common challenges when working with AI models, especially LLMs that aren’t always very consistent.

✏️ Problem

When using LLMs to generate JSON, the output often includes extra text before or after the actual JSON.
For example:

Hello! Here is your result:
{"key":"value"}

Or sometimes the output is formatted as code blocks:

```json
[{ "key": "value" }]```

In such cases, calling parse directly fails, since the string isn’t valid JSON.

💡 Proposed solution

Add an option (like other enums) that allows the parser to:

Detect and extract the first JSON snippet from the text, ignoring non‑JSON prefixes or suffixes.
Optionally specify whether to search for an object ({...}) or an array ([...]) depending on what the user expects.
Optionally fix slightly malformed JSON, e.g. removing extra characters between braces or brackets.

Example utility function to extract JSON from text:

export function extractJsonFromText(text: string): object {
  const matches = text.match(/[{\[]{1}([,:{}\[\]0-9.\-+Eaeflnr-u \n\r\t]|".*?")+[}\]]{1}/gis);
  return matches.map((m) => JSON.parse(m)).flat();
}

✅ Example usage

import { parse } from "partial-json";

const result = parse('```{"key":"value"}```');
// Ideally: result => { key: 'value' }

Looking forward to your feedback! 🌱

Jul 13 '25 13:07 ErfanBahramali

This is a good question! There were people requesting a similar feature too:

I recommend you making a separate tool that extract json using the regex you provided in text because:

The json detecting / extracting process is outside the main logic of partial json parsing. It acts as a pre-processing step, so it is achievable
I prefer packages to be atomic, which means every package only do minimal things to implement the whole functionality. partial-json means incomplete json, rather than malformed json / potential json inside text. partial-json-extractor would be a suitable name for your case
Modern LLMs do produce JSON well. In my personal practices, I seldom see results like missing \n before the ending fence ```. So the json detection regex is much simpler like this
Not every valid JSON substring are what people want. Strings like [1] may occur in normal responses. Detecting valid JSON too eagerly may be too aggressive for common use cases. For example, when an LLM outputs [, it is likely to be starting [DONE] rather than a JSON list.

Jul 14 '25 07:07 CNSeniorious000