needle icon indicating copy to clipboard operation
needle copied to clipboard

utf8 redirect breaks stuff

Open rlidwka opened this issue 3 years ago • 6 comments

Part 1: error

There is a server on the internet that shows following headers:

$ curl -I http://59.gibdd.ru
HTTP/1.1 301 Moved Permanently
Date: Mon, 13 Sep 2021 15:51:29 GMT
Server: Apache
Location: https://гибдд.рф/r/59/
Vary: Accept-Encoding
Content-Type: text/html; charset=iso-8859-1

I'm trying to follow that redirect, and it fails:

require('needle')('get','http://59.gibdd.ru/',{follow:10}).
  then(()=>console.log('ok'),()=>console.log('error'))

It produces this:

> Uncaught TypeError [ERR_INVALID_URL]: Invalid URL: https://гибдд.ÑÑ
                                                                          /r/59/
    at onParseError (internal/url.js:259:9)
    at new URL (internal/url.js:335:5)
    at resolve_url (/tmp/node_modules/needle/lib/needle.js:165:12)
    at ClientRequest.<anonymous> (/tmp/node_modules/needle/lib/needle.js:582:28)
    at Object.onceWrapper (events.js:422:26)
    at ClientRequest.emit (events.js:315:20)
    at ClientRequest.EventEmitter.emit (domain.js:529:15)
    at HTTPParser.parserOnIncomingClient [as onIncoming] (_http_client.js:641:27)
    at HTTPParser.parserOnHeadersComplete (_http_common.js:126:17)
    at Socket.socketOnData (_http_client.js:509:22) {
  input: 'https://гибдд.Ñ\x80Ñ\x84/r/59/',
  code: 'ERR_INVALID_URL'
}

This error is not intercepted by needle and is thrown into the void (or rather node.js own unhandledException handler).

Promise is never resolved or rejected, and user has no way of intercepting it.

Part 2: header encoding

Sometimes someone upstream makes a decision that every single downstream developer has to fix on their own. Sad fact of life. So...

Turns out, node.js itself returns headers in binary encoding, as explained here: https://github.com/nodejs/node/issues/7928

> require('needle')('get','http://59.gibdd.ru/').then(res=>{console.log(res.headers)})
Promise { <pending> }
> {
  date: 'Mon, 13 Sep 2021 15:55:57 GMT',
  server: 'Apache',
  location: 'https://гибдд.Ñ\x80Ñ\x84/r/59/',
  vary: 'Accept-Encoding',
  'content-length': '298',
  connection: 'close',
  'content-type': 'text/html; charset=iso-8859-1'
}

But if you try to parse that URL, you'll get either incorrect result (with url.parse) or an exception (with new URL).

Got resolved this by using Buffer.from(res.headers.location, 'binary').toString() instead of res.headers.location as shown here: https://github.com/sindresorhus/got/pull/214

Needle fails here: https://github.com/tomas/needle/blob/b4913a5d77afbdcaa49ceaa3bf6b34706d2b1bbf/lib/needle.js#L582

So instead of:

var redirect_url = resolve_url(headers.location, uri);

It should be:

var location = Buffer.from(headers.location, 'binary').toString();
var redirect_url = resolve_url(location, uri);

User should probably get fixed location header as well (shouldn't break api because it was never working in the first place): https://github.com/tomas/needle/blob/b4913a5d77afbdcaa49ceaa3bf6b34706d2b1bbf/lib/needle.js#L555

Part 3: intercepting an error

Even if we fix encoding on a legitimate redirect, someone can intentionally or accidentally throw in something that breaks new URL syntax.

> new url.URL('http://\xB8', 'http://google.com');
Uncaught TypeError [ERR_INVALID_URL]: Invalid URL: http://¸

So I believe the call to new URL should be wrapped in try..catch and error thrown whenever request errors would normally go.

rlidwka avatar Sep 13 '21 16:09 rlidwka

test

sessionsKrammer avatar Sep 15 '21 00:09 sessionsKrammer

@tomas , could you take a look? Unhandled error is a serious problem. If you have no time - we can do PR. We need new release with fix. Error handling - mandatory, redirect encoding - if possible.

puzrin avatar Sep 18 '21 12:09 puzrin

@tomas can we help anyhow? Fix of this issue is important for us.

puzrin avatar Sep 24 '21 15:09 puzrin

http headers are expected to be ISO-8859-1. the server sends the location header in utf-8, the http module interprets it as latin1 and we end up with mojibake. most browsers these days gracefully accept headers in utf-8 and don't break. i think it's worth doing the same.

yetzt avatar Oct 19 '21 07:10 yetzt

@yetzt note, bad headers is not a big deal. Problem is in unhandled exception and loss of promise end. First post contains all details how to fix with minimal efforts.

puzrin avatar Oct 19 '21 08:10 puzrin

if so, open a pull request. @puzrin

yetzt avatar Oct 19 '21 09:10 yetzt