agent-zero
agent-zero copied to clipboard
fix: HTML tag regex pattern to prevent infinite loop on setMessage
Summary
- Change to webui/js/message.js:944 tegRegex pattern to improve robustness to < characters from not real tags / unmatched tags
Background
- After letting my agents get well into file creation (and trying to reflect on their incorrect syntax) the agents eventually generated a response that, on parsing by webui/js/message.js:946 causes the UI to enter an endless loop
- My regex is even weaker than my js but the amended Windsurf generated pattern seemed to handle the payload + the entirety of my other chat histories without apparent rendering side effects
GPT5 Analysis (Regex is outside of my balliwick sorry)
- The input includes non-HTML “double less-than” heredoc marker:
<< 'EOF', and also many pseudo-XML tags like<thought>,<analysis>, etc. - The splitting regex
tagRegexis a naïve HTML matcher with nested alternation and a repeating group:(< (?: [^<>"']+ | "..." | '...' )* >)- When it meets sequences starting with
<that are not real tags (e.g.,<< 'EOF') or when there are many<...>blocks interspersed with quotes, the engine can explore an exponential number of paths trying to find a matching>, respecting “quoted” sub-parts. That’s a classic catastrophic backtracking pattern. - This can spike CPU and feel like a freeze for certain inputs. Your heredoc example is exactly the kind of text that triggers it.
- Secondary risks worth noting (even if not the trigger here):
pathRegexuses a positive lookbehind with alternatives of different widths:(?<=^|[> \'"\n]|'|")`. Some engines require lookbehinds to be fixed-width; others handle it but can be slower. This also gets re-created on every call.
Recommended patterns
[Strict-ish HTML tag with attributes] <= I chose this one
- Matches tags that start with a letter (optionally preceded by /), with reasonable attribute/name shapes and quoted values.
- Skips heredoc << and other non-tag <.
- Good balance of coverage and speed.
const tagRegex = /(<\/?(?:[A-Za-z][A-Za-z0-9:-]*)(?:\s+[A-Za-z_:][A-Za-z0-9:._-]*(?:\s*=\s*(?:"[^"]*"|'[^']*'|[^\s'"=<>`]+))?)*\s*\/?>)/g;
[Simpler, still safe “tag-ish”]
- Requires < then optional / then a letter; consumes to > while allowing quoted strings.
- Even simpler, still avoids matching << 'EOF'.
const tagRegex = /(<(?!<)\/?[A-Za-z][^>"']*(?:"[^"]*"|'[^']*')*[^>"']*>)/g;
Next steps
- Someone more familiar than me should review the Regex / recommendations for if there is a better way to implement
- Confirm no artifacts introduced by change: possibly via doing a bulk string compare of oldRegexOutput vs newRegexOutput against a few thousand message.content from any chat.json on hand => confirming no unintended differences
Error causing message
{
"no": 371,
"id": null,
"type": "response",
"heading": "icon://chat A2: Responding",
"content": "<thought>\nI have completed my reasoning and retrieval steps. I will now synthesize this information into the final RRR analysis.\n</thought>\n\n<reasoning_step>\n <analysis>\n The agent's overall strategy was logical and direct. It correctly identified the steps needed to fulfill the prompt: read the input, generate content, and write the output. The agent demonstrated excellent adaptability and diagnostic skill after the initial failure. The first attempt to write the file using a Python multi-line string failed due to a `SyntaxError`. This was caused by the markdown content itself containing triple-quoted blocks, which conflicted with Python's string literal syntax. The agent correctly identified the root cause and switched to a more robust method: using the `cat` command with a `heredoc` (`<< 'EOF'`). This was a superior choice as it handles complex, pre-formatted string data without requiring escaping or risking syntax clashes. The primary inefficiency was the initial failed tool-use attempt.\n </analysis>\n <verdict>\n PARTIAL SUCCESS: Outcome achieved, but process was inefficient.\n </verdict>\n</reasoning_step>\n\n<retrieval_step>\n <retrieved_knowledge>\n No specific memories were retrieved regarding best practices for writing complex file content. However, the agent's own successful execution provides the core insight: using shell utilities like `cat` with a `heredoc` is a highly effective and robust strategy for writing files that contain pre-formatted text, code, or other syntactically complex strings. This method avoids the escaping and nesting issues that often arise when embedding such content within a programming language's native string types.\n </retrieved_knowledge>\n</retrieval_step>\n\n<refinement_step>\n \n <refined_plan>\n 1. Read the input file `/root/prd_ideas.json` to gather the feature requirements.\n 2. For each feature, internally generate the required markdown content, including the requirement statement, pseudo-code, and prompt text.\n 3. **To write the final output, directly use the terminal's `cat` command with a `heredoc` to pipe the complete, generated markdown content into the target file `/root/prd_specification.md`. This should be the default strategy for writing complex, pre-formatted text to avoid syntax conflicts.**\n 4. Verify the file has been created successfully.\n 5. Initiate the performance review.\n </refined_plan>\n</refinement_step>",
"temp": false,
"kvps": {
"finished": true
}
}