ai
ai copied to clipboard
[AutoRAG] streaming sometimes emits numeric JSON for response chunks, dropping adjacent whitespace/newlines and corrupting formatting
Summary
When streaming from AutoRAG, most frames contain a string response field as expected, but intermittently a frame encodes the token as a JSON number (for example, 155724 or 3e+50), which causes adjacent whitespace/newlines to be lost and corrupts markdown/code-rendering downstream.
Environment
Product: Cloudflare AutoRAG (streaming mode)
Model: cf/meta/llama-3.3-70b-instruct-fp8-fast
Client: Server-sent events over HTTPS, parsed in a TypeScript/Node.js app
Runtime: Node.js v20.x in production; local dev uses v20.x
Steps to reproduce
Enable streaming on an AutoRAG assistant-style request over SSE.
Prompt the model to produce a mermaid diagram with classDef lines that include CSS-like hex color literals and explicit newlines between definitions.
Observe the SSE frames; most tokens arrive as strings, but intermittently a token that looks numeric is emitted as a JSON number instead of a JSON string.
When this occurs, the adjacent newline or whitespace is effectively lost for that token in downstream parsers that treat the numeric as a non-string, resulting in malformed mermaid/markdown formatting.
Expected behavior Every streamed token in the response field is serialized as a JSON string regardless of numeric appearance, including scientific-notation-shaped substrings.
Whitespace and newline fidelity is preserved across chunk boundaries to maintain exact text formatting in markdown/code blocks.
Actual behavior
Occasionally, a frame encodes response as a JSON number (for example, 155724 or 3e+50) which strips surrounding whitespace/newlines when parsed, breaking code blocks and mermaid diagrams.
Example Source text intended from the model output
text
...
classDef startEnd fill:#ffffff,stroke:#2c3e50,stroke-width:3px,color:#2c3e50
classDef process fill:#ecf0f1,stroke:#34495e,stroke-width:2px,color:#2c3e50
classDef decision fill:#f8f9fa,stroke:#6c757d,stroke-width:2px,color:#495057
classDef system fill:#e8f5e8,stroke:#28a745,stroke-width:2px,color:#155724
classDef critical fill:#fff3cd,stroke:#856404,stroke-width:2px,color:#533f03
Observed adjacent frames (excerpt) where the color literal and following newline are split, but the numeric chunk is emitted as a number:
text
{
"response": "155724 classDef error fill:#fff5f5,stroke:#dc3545,stroke-width:2px,color:#721c24\n \n class A,L,M startEnd\n class B,D,F,I,J process\n class C,G,H decision\n ...",
"tool_calls": [],
"usage": {
"prompt_tokens": 6101,
"completion_tokens": 650,
"total_tokens": 6751
},
"streamed_data": [
{
"response": "",
"tool_calls": [],
"p": "abdefghijklmnoprstuvxyz1234"
},
{
"response": "##",
"tool_calls": [],
"p": "abdefghijklmnoprstuvxyz"
},
{
"response": " Summary",
"tool_calls": [],
"p": "abdefghijklmn"
},
{
"response": "\n",
"tool_calls": [],
"p": "abdefghijklmnoprstuvxyz1234"
},
{
"response": "The booking cancellation",
"tool_calls": [],
"p": "abdefghi"
},
{
"response": " workflow is a",
"tool_calls": [],
"p": "abdefghijklmnoprstuvxyz1234567890abdefghijklmnopr"
},
{
"response": " systematic",
"tool_calls": [],
"p": "abdefghijklmnoprstuvxyz1234567890abdefghijklm"
},
{
"response": " process designed",
"tool_calls": [],
"p": "abdefg"
},
...
{ "response": "px,color:#", "tool_calls": [], "p": "…" }
{ "response": 155724, "tool_calls": [], "p": "…" }
{ "response": " classDef", "tool_calls": [], "p": "…" }
Because 155724 is a number instead of the string "155724\n", the newline expected after the hex digits is not preserved by typical string-concatenation logic in the client, resulting in a joined line (and later lines shift), which invalidates the mermaid block.
Impact
Markdown and code blocks render incorrectly. Post-processing pipelines that expect strings fail or silently degrade output fidelity.
Requested Fix
Ensure the streaming server always emits token text in a string field, regardless of token contents.
Additional context
This occurs more frequently during long generations that include code or diagram blocks with many numeric substrings (for example, hex colors), which aligns with typical AutoRAG assistant streaming patterns.
Please assist! Thank you