needle
needle copied to clipboard
utf8 redirect breaks stuff
Part 1: error
There is a server on the internet that shows following headers:
$ curl -I http://59.gibdd.ru
HTTP/1.1 301 Moved Permanently
Date: Mon, 13 Sep 2021 15:51:29 GMT
Server: Apache
Location: https://гибдд.ÑÑ/r/59/
Vary: Accept-Encoding
Content-Type: text/html; charset=iso-8859-1
I'm trying to follow that redirect, and it fails:
require('needle')('get','http://59.gibdd.ru/',{follow:10}).
then(()=>console.log('ok'),()=>console.log('error'))
It produces this:
> Uncaught TypeError [ERR_INVALID_URL]: Invalid URL: https://óøñôô.ÃÃ
/r/59/
at onParseError (internal/url.js:259:9)
at new URL (internal/url.js:335:5)
at resolve_url (/tmp/node_modules/needle/lib/needle.js:165:12)
at ClientRequest.<anonymous> (/tmp/node_modules/needle/lib/needle.js:582:28)
at Object.onceWrapper (events.js:422:26)
at ClientRequest.emit (events.js:315:20)
at ClientRequest.EventEmitter.emit (domain.js:529:15)
at HTTPParser.parserOnIncomingClient [as onIncoming] (_http_client.js:641:27)
at HTTPParser.parserOnHeadersComplete (_http_common.js:126:17)
at Socket.socketOnData (_http_client.js:509:22) {
input: 'https://óøñôô.Ã\x80Ã\x84/r/59/',
code: 'ERR_INVALID_URL'
}
This error is not intercepted by needle and is thrown into the void (or rather node.js own unhandledException handler).
Promise is never resolved or rejected, and user has no way of intercepting it.
Part 2: header encoding
Sometimes someone upstream makes a decision that every single downstream developer has to fix on their own. Sad fact of life. So...
Turns out, node.js itself returns headers in binary encoding, as explained here: https://github.com/nodejs/node/issues/7928
> require('needle')('get','http://59.gibdd.ru/').then(res=>{console.log(res.headers)})
Promise { <pending> }
> {
date: 'Mon, 13 Sep 2021 15:55:57 GMT',
server: 'Apache',
location: 'https://óøñôô.Ã\x80Ã\x84/r/59/',
vary: 'Accept-Encoding',
'content-length': '298',
connection: 'close',
'content-type': 'text/html; charset=iso-8859-1'
}
But if you try to parse that URL, you'll get either incorrect result (with url.parse
) or an exception (with new URL
).
Got resolved this by using Buffer.from(res.headers.location, 'binary').toString()
instead of res.headers.location
as shown here: https://github.com/sindresorhus/got/pull/214
Needle fails here: https://github.com/tomas/needle/blob/b4913a5d77afbdcaa49ceaa3bf6b34706d2b1bbf/lib/needle.js#L582
So instead of:
var redirect_url = resolve_url(headers.location, uri);
It should be:
var location = Buffer.from(headers.location, 'binary').toString();
var redirect_url = resolve_url(location, uri);
User should probably get fixed location header as well (shouldn't break api because it was never working in the first place): https://github.com/tomas/needle/blob/b4913a5d77afbdcaa49ceaa3bf6b34706d2b1bbf/lib/needle.js#L555
Part 3: intercepting an error
Even if we fix encoding on a legitimate redirect, someone can intentionally or accidentally throw in something that breaks new URL
syntax.
> new url.URL('http://\xB8', 'http://google.com');
Uncaught TypeError [ERR_INVALID_URL]: Invalid URL: http://¸
So I believe the call to new URL
should be wrapped in try..catch
and error thrown whenever request errors would normally go.
test
@tomas , could you take a look? Unhandled error is a serious problem. If you have no time - we can do PR. We need new release with fix. Error handling - mandatory, redirect encoding - if possible.
@tomas can we help anyhow? Fix of this issue is important for us.
http headers are expected to be ISO-8859-1. the server sends the location header in utf-8, the http module interprets it as latin1 and we end up with mojibake. most browsers these days gracefully accept headers in utf-8 and don't break. i think it's worth doing the same.
@yetzt note, bad headers is not a big deal. Problem is in unhandled exception and loss of promise end. First post contains all details how to fix with minimal efforts.
if so, open a pull request. @puzrin