🚀 Feature Request: Extract JSON from noisy or wrapped AI model output
Hi First, thank you for this great library — it’s very helpful.
I’d like to suggest an enhancement based on common challenges when working with AI models, especially LLMs that aren’t always very consistent.
✏️ Problem
When using LLMs to generate JSON, the output often includes extra text before or after the actual JSON.
For example:
Hello! Here is your result:
{"key":"value"}
Or sometimes the output is formatted as code blocks:
```json
[{ "key": "value" }]```
In such cases, calling parse directly fails, since the string isn’t valid JSON.
💡 Proposed solution
Add an option (like other enums) that allows the parser to:
- Detect and extract the first JSON snippet from the text, ignoring non‑JSON prefixes or suffixes.
- Optionally specify whether to search for an object (
{...}) or an array ([...]) depending on what the user expects. - Optionally fix slightly malformed JSON, e.g. removing extra characters between braces or brackets.
Example utility function to extract JSON from text:
export function extractJsonFromText(text: string): object {
const matches = text.match(/[{\[]{1}([,:{}\[\]0-9.\-+Eaeflnr-u \n\r\t]|".*?")+[}\]]{1}/gis);
return matches.map((m) => JSON.parse(m)).flat();
}
✅ Example usage
import { parse } from "partial-json";
const result = parse('```{"key":"value"}```');
// Ideally: result => { key: 'value' }
Looking forward to your feedback! 🌱
This is a good question! There were people requesting a similar feature too:
- #10
I recommend you making a separate tool that extract json using the regex you provided in text because:
- The json detecting / extracting process is outside the main logic of partial json parsing. It acts as a pre-processing step, so it is achievable
- I prefer packages to be atomic, which means every package only do minimal things to implement the whole functionality.
partial-jsonmeans incomplete json, rather than malformed json / potential json inside text.partial-json-extractorwould be a suitable name for your case - Modern LLMs do produce JSON well. In my personal practices, I seldom see results like missing
\nbefore the ending fence```. So the json detection regex is much simpler like this - Not every valid JSON substring are what people want. Strings like
[1]may occur in normal responses. Detecting valid JSON too eagerly may be too aggressive for common use cases. For example, when an LLM outputs[, it is likely to be starting[DONE]rather than a JSON list.