Mixed output causes httpie to preprocess it incorrectly
Checklist
- [x] I've searched for similar issues.
- [x] I'm using the latest version of HTTPie.
Minimal reproduction code and steps
- Create a request to a service that returns mime type
text/htmlwith json body and escaped html inside a string - Observe the HTML getting highlighted and characters converted to their unescaped versions
- Compare with piping to cat to remove preprocessing, where the characters are left as they are
Current result
For example, proxy dns.google but set it's return content-type to text/html (proxy_pass https://dns.google; add_header Content-Type text/html always; in nginx).
http "http://localhost/resolve?name=example.com%3Cscript%3Ealert(1)%3C%2Fscript%3E" -v | cat
GET /resolve?name=example.com%3Cscript%3Ealert(1)%3C%2Fscript%3E HTTP/1.1
Accept-Encoding: gzip, deflate, br
Accept: */*
Connection: keep-alive
User-Agent: HTTPie/3.2.4
Host: dns.google
HTTP/1.1 200 OK
X-Content-Type-Options: nosniff
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
Access-Control-Allow-Origin: *
Date: Sat, 08 Mar 2025 11:22:11 GMT
Expires: Sat, 08 Mar 2025 11:22:11 GMT
Cache-Control: private, max-age=86399
Content-Type: text/html; charset=UTF-8
Content-Encoding: gzip
Server: HTTP server (unknown)
X-XSS-Protection: 0
X-Frame-Options: SAMEORIGIN
Alt-Svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
Transfer-Encoding: chunked
{"Status":3,"TC":false,"RD":true,"RA":true,"AD":true,"CD":false,"Question":[{"name":"example.com\u003cscript\u003ealert(1)\u003c/script\u003e.","type":1}],"Authority":[{"name":".","type":6,"TTL":86399,"data":"a.root-servers.net. nstld.verisign-grs.com. 2025030800 1800 900 604800 86400"}]}
is the raw code, but without |cat gets rendered as
{
"AD": true,
"Authority": [
{
"TTL": 86397,
"data": "a.root-servers.net. nstld.verisign-grs.com. 2025030800 1800 900 604800 86400",
"name": ".",
"type": 6
}
],
"CD": false,
"Question": [
{
"name": "example.com<script>alert(1)</script>.",
"type": 1
}
],
"RA": true,
"RD": true,
"Status": 3,
"TC": false
}
which is incorrect and can be confusing
Expected result
Same as |cat output as there's no real HTML to prettify
Additional information, screenshots, or code examples
I’d like to clarify what’s going on under the hood:
By-design behavior for text/html HTTPie treats any response labeled Content-Type: text/html as “opaque” text, so when you request pretty-printed JSON with --json it still (a) syntax-highlights it as HTML, and (b) hands the raw Python object to json.dumps(..., ensure_ascii=False). That parameter is explicitly chosen to improve human readability by unescaping \uXXXX sequences into their corresponding characters.
Why it feels like a bug It only surfaces when a server mislabels a JSON payload as text/html. Because the JSON body contains escaped HTML ("\u003c"), you end up seeing < in the output, even though the original JSON literally contained \u003c.
Options to preserve your escapes
Fix upstream: Have your server use the correct Content-Type: application/json; charset=utf-8. Then HTTPie will (correctly) call json.dumps(..., ensure_ascii=True), preserving all \uXXXX sequences.
Workaround in HTTPie: You could add a flag (or patch) around that one call site in json.py to force ensure_ascii=True when you detect --json, or introduce a new option like --preserve-escapes.
Conclusion The premature unescaping is indeed happening in HTTPie, but it’s an intentional readability feature for non-JSON content. The “real” bug is on the server side sending the wrong Content-Type header. If you’re blocked by a server you can’t change, we could consider adding a new HTTPie option to preserve all escapes regardless of content type. Let me know if you’d like to collaborate on implementing that!