cli icon indicating copy to clipboard operation
cli copied to clipboard

Mixed output causes httpie to preprocess it incorrectly

Open danbulant opened this issue 9 months ago • 1 comments

Checklist

  • [x] I've searched for similar issues.
  • [x] I'm using the latest version of HTTPie.

Minimal reproduction code and steps

  1. Create a request to a service that returns mime type text/html with json body and escaped html inside a string
  2. Observe the HTML getting highlighted and characters converted to their unescaped versions
  3. Compare with piping to cat to remove preprocessing, where the characters are left as they are

Current result

For example, proxy dns.google but set it's return content-type to text/html (proxy_pass https://dns.google; add_header Content-Type text/html always; in nginx).

http "http://localhost/resolve?name=example.com%3Cscript%3Ealert(1)%3C%2Fscript%3E" -v | cat
GET /resolve?name=example.com%3Cscript%3Ealert(1)%3C%2Fscript%3E HTTP/1.1
Accept-Encoding: gzip, deflate, br
Accept: */*
Connection: keep-alive
User-Agent: HTTPie/3.2.4
Host: dns.google

HTTP/1.1 200 OK
X-Content-Type-Options: nosniff
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
Access-Control-Allow-Origin: *
Date: Sat, 08 Mar 2025 11:22:11 GMT
Expires: Sat, 08 Mar 2025 11:22:11 GMT
Cache-Control: private, max-age=86399
Content-Type: text/html; charset=UTF-8
Content-Encoding: gzip
Server: HTTP server (unknown)
X-XSS-Protection: 0
X-Frame-Options: SAMEORIGIN
Alt-Svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
Transfer-Encoding: chunked

{"Status":3,"TC":false,"RD":true,"RA":true,"AD":true,"CD":false,"Question":[{"name":"example.com\u003cscript\u003ealert(1)\u003c/script\u003e.","type":1}],"Authority":[{"name":".","type":6,"TTL":86399,"data":"a.root-servers.net. nstld.verisign-grs.com. 2025030800 1800 900 604800 86400"}]}

is the raw code, but without |cat gets rendered as

{
    "AD": true,
    "Authority": [
        {
            "TTL": 86397,
            "data": "a.root-servers.net. nstld.verisign-grs.com. 2025030800 1800 900 604800 86400",
            "name": ".",
            "type": 6
        }
    ],
    "CD": false,
    "Question": [
        {
            "name": "example.com<script>alert(1)</script>.",
            "type": 1
        }
    ],
    "RA": true,
    "RD": true,
    "Status": 3,
    "TC": false
}

which is incorrect and can be confusing

Expected result

Same as |cat output as there's no real HTML to prettify

Additional information, screenshots, or code examples

Image

danbulant avatar Mar 08 '25 11:03 danbulant

I’d like to clarify what’s going on under the hood:

By-design behavior for text/html HTTPie treats any response labeled Content-Type: text/html as “opaque” text, so when you request pretty-printed JSON with --json it still (a) syntax-highlights it as HTML, and (b) hands the raw Python object to json.dumps(..., ensure_ascii=False). That parameter is explicitly chosen to improve human readability by unescaping \uXXXX sequences into their corresponding characters.

Why it feels like a bug It only surfaces when a server mislabels a JSON payload as text/html. Because the JSON body contains escaped HTML ("\u003c"), you end up seeing < in the output, even though the original JSON literally contained \u003c.

Options to preserve your escapes

Fix upstream: Have your server use the correct Content-Type: application/json; charset=utf-8. Then HTTPie will (correctly) call json.dumps(..., ensure_ascii=True), preserving all \uXXXX sequences.

Workaround in HTTPie: You could add a flag (or patch) around that one call site in json.py to force ensure_ascii=True when you detect --json, or introduce a new option like --preserve-escapes.

Conclusion The premature unescaping is indeed happening in HTTPie, but it’s an intentional readability feature for non-JSON content. The “real” bug is on the server side sending the wrong Content-Type header. If you’re blocked by a server you can’t change, we could consider adding a new HTTPie option to preserve all escapes regardless of content type. Let me know if you’d like to collaborate on implementing that!

rhit-reillydj avatar May 23 '25 01:05 rhit-reillydj