forwarded icon indicating copy to clipboard operation
forwarded copied to clipboard

v1.0.0-rc.1

Open ahmadnassri opened this issue 10 years ago • 23 comments

This is essentially a complete re-write and expansion on the original functionality of v0.1.0

  • modular schema support, allows for easy future expansions
  • support RFC 7239 & X-Forwarded-* as primary trusted schemas
  • allows for implementers to choose schemas to process by passing an options object
  • this is a breaking api change. where v0.1.0 returns an array of IPs, v1.0.0 returns an object representing parameters of RFC 7239
    • the version increment is required to allow dependants libraries/implementers to continue to operate safely with out version breakage (assuming version resolving occurs with npm)
    • ref: RFC 7239 Parameters
  • coupled with support for less common schemas, named in association with vendor usage:
    • cloudflare
    • fastly
    • microsoft
    • nginx
    • zscaler
TODO
  • [x] test with all direct dependents & indirect dependents after updating to forwarded().addrs
    • hapi-forward has no tests
    • hapi-forwarded works
    • proxy-addr works (with minor fix to its own test)
    • express works
    • at this point, confident the remainder will also work.
  • [x] test node 0.6
  • [x] test with real http server / client
  • [ ] parse Forwarded header elements using RFC 7230 token syntax
  • [ ] start pull requests for dependents with appropriate changes. (post merge & release)

ahmadnassri avatar May 30 '15 07:05 ahmadnassri

proxy-addr works (with minor fix to its own test)

it's actually a bug in your PR here that it relieved. the issue is that you are using a simple mock for req, but it does not fully test it. req.connection.remotePort can legitimately be undefined.

Here is the follow-though using Node.js 0.10.24:

  1. Access to the remotePort getter at https://github.com/joyent/node/blob/v0.10.24/lib/net.js#L581
  2. Call this._getpeername() https://github.com/joyent/node/blob/v0.10.24/lib/net.js#L561
  3. Two places in the method return {} (https://github.com/joyent/node/blob/v0.10.24/lib/net.js#L563 and https://github.com/joyent/node/blob/v0.10.24/lib/net.js#L569).
  4. Return the port property from this._getpeername(); the two cases from #3 will thus return undefined for remotePort

dougwilson avatar May 30 '15 15:05 dougwilson

Another concern I have is why does this PR seem to be significantly slower? So this module does not have a benchmark suite, but proxy-addr does, so simply installing this PR into proxy-addr and changing the usage to forwarded(req).addrs results in the following difference:

#### forwarded 0.1.0

> node benchmark\matching.js

  1 test completed.
  2 tests completed.
  3 tests completed.
  4 tests completed.
  5 tests completed.
  6 tests completed.

  trust none     x 864,390 ops/sec ±1.62% (175 runs sampled)
  trust all      x 909,210 ops/sec ±1.55% (176 runs sampled)
  trust single   x  18,817 ops/sec ±1.37% (171 runs sampled)
  trust first    x 949,981 ops/sec ±1.30% (176 runs sampled)
  trust subnet   x  18,371 ops/sec ±1.55% (173 runs sampled)
  trust multiple x  18,834 ops/sec ±1.44% (174 runs sampled)
### forwarded PR

> node benchmark\matching.js

  1 test completed.
  2 tests completed.
  3 tests completed.
  4 tests completed.
  5 tests completed.
  6 tests completed.

  trust none     x 54,219 ops/sec ±1.54% (180 runs sampled)
  trust all      x 52,660 ops/sec ±1.75% (175 runs sampled)
  trust single   x 13,887 ops/sec ±1.55% (177 runs sampled)
  trust first    x 53,196 ops/sec ±1.36% (177 runs sampled)
  trust subnet   x 13,665 ops/sec ±1.39% (175 runs sampled)
  trust multiple x 13,615 ops/sec ±1.51% (180 runs sampled)

The matching benchmark is the closest one that measures just the forwarded module performance (mainly the "trust none", "trust all", and "trust first", which have almost no overhead other than simply calling forwarded).

As you can see above, based on those three benchmarks, performance dropped significantly. Since this module would theoretically be called on every request to a web server, it has a direct impact on the maximum req/sec a single Node.js instance can serve.

I understand, of course, that the current implement was very simply, but I did not expect to see it drop all the way down to 55k ops/sec. My main question is: can we do better? What are the bottlenecks in this code? Can v8 optimize functions, or does it keep bailing out for some reason?

This note on performance is not technically a blocker, though, but just something I noted and think one of us should at least glance into :)

dougwilson avatar May 30 '15 16:05 dougwilson

Another concern I have is why does this PR seem to be significantly slower?

that is concerning, I did not expect much change in the way of performance, so it never occurred to me to apply a benchmark, I'll take a deeper dive into this and find the bottleneck.

thanks for all the great feedback, I'll make the appropriate changes.

ahmadnassri avatar May 30 '15 20:05 ahmadnassri

thanks for all the great feedback

Absolutely! I wanted to get your the feedback quickly :) I noticed the repo was last changed in 2014, so I just make a few mundane updates to master; it is up to you if you want to merge it in your branch, rebase your branch on top, or simply do nothing w.r.t those updates :)

Also, with the significant code, please feel free to also add your name for 2015 in the LICENSE file.

dougwilson avatar May 30 '15 21:05 dougwilson

I noticed the repo was last changed in 2014, so I just make a few mundane updates to master; it is up to you if you want to merge it in your branch, rebase your branch on top, or simply do nothing w.r.t those updates :)

resolved in 1761f9d

Also, with the significant code, please feel free to also add your name for 2015 in the LICENSE file.

resolved in 7a3cb41

ahmadnassri avatar May 31 '15 00:05 ahmadnassri

node v0.6 successful tests with real server / client :tada:

ahmadnassri avatar May 31 '15 02:05 ahmadnassri

progress paused for a couple of days due to workload, still experimenting with bnf parsers and regex setup

ahmadnassri avatar Jun 02 '15 09:06 ahmadnassri

I wrote https://github.com/lpinca/forwarded-parse to parse the Forwarded header trying to follow the ABNF. It throws an error if the syntax is not valid. It needs to be tested a lot more and maybe improved, but if it might be of interest, I'm more than happy to give you full access to repo or move it under this organization.

lpinca avatar Jun 04 '15 20:06 lpinca

@lpinca that's great, i've been bogged down with work and this is the last remaining piece in this PR to fully test / solve (proper parsing of Forwarded) i'll review your module and perhaps merge it into lib/schemas/rfc7239.js with credits of course!

ahmadnassri avatar Jun 04 '15 20:06 ahmadnassri

oh my, I've let this one slip ... lets try to get this merged before its' year anniversary :stuck_out_tongue:

I'll review the work thus far, and look into the notes from @dougwilson whatever additional improvements are needed will be added.

ahmadnassri avatar Mar 26 '16 20:03 ahmadnassri

I've added some minor performance improvements and switched to using @lpinca forwarded-parse for better ANBF compatibility (though it seems to be very strict)

next steps:

  • [ ] fix travis issues for node 0.6, 0.8
  • [ ] add travis tests for recent node versions
  • [ ] increase code-coverage
  • [ ] add benchmark suite (in separate repo)
  • [ ] improve performance where possible
  • [ ] test alternative RFC 7239 parsing module and compare performance and spec compatibility

ahmadnassri avatar Mar 26 '16 23:03 ahmadnassri

@dougwilson any further direction from you to the todo list above?

ahmadnassri avatar Mar 26 '16 23:03 ahmadnassri

and switched to using @lpinca forwarded-parse for better ANBF compatibility

This part I don't understand. The entire purpose of this module is literally to do that operation, not depend on another module to do it. This module's name marched the name of the header for a reason...

dougwilson avatar Mar 26 '16 23:03 dougwilson

Basically I think there is a huge misunderstanding of the current separation of concerns between the modules here.

This module is meant only to be concerned with parsing/formatting of the HTTP "Forwarded" header (and I suppose the X headers that it unified)

The proxy-addr module is designed to determine the "proxied address" of a request, which could include configuring the source of proxy headers.

This PR seems to be heavily mixing those two together into this single library to me.

dougwilson avatar Mar 26 '16 23:03 dougwilson

A good litmus test in this organization is that if your code logically organizes into more than a single JavaScript file, that is multiple modules, not one. There are still some old, inherited modules that have a lib folder, but we are slowing working to get rid of that such that every module consists only of a single index.js file (for runtime code).

dougwilson avatar Mar 27 '16 00:03 dougwilson

Oh, I know I keep making comments but I keep thinking of new things to say :) I have not yet looked at the new changes, or even really looked though to remind myself of the existing changes, but I think all the features and stuff in here are good, but more though from us (and perhaps the technical committee that now oversees this organization) needs to be done around which changes go into which modules, what new modules should be created, etc.

dougwilson avatar Mar 27 '16 00:03 dougwilson

This part I don't understand. The entire purpose of this module is literally to do that operation, not depend on another module to do it. This module's name marched the name of the header for a reason...

I suppose I did not expand on my thoughts here... mea culpa, let me explain:

  • forwarded-parse seems to be doing a proper job of parsing the Forwarded header in accordance with the spec.
  • @lpinca has volunteered donating his work to this organization, which is great.
  • this original forwarded module focused on X-Forwarded-For header, and only concerned itself with the IP addresses, including the socket address for the request (though the latter part could be considered out of scope, if we're talking only about headers)
  • the RFC 7239 spec, introduces by, host, proto in addition to for as core parameters. (while also allowing for extensions)

so my thoughts up this point:

  • the parsing module if donated into the org can continue to live separately and focuses on the spec (while allowing extensions)
  • this module should focus on those core parameters, and any fallback that might be derived from defacto standards (such as X-Forwarded-* and X-Real-* headers) or vendor extensions used in popular cloud services.
  • the proxy-addr continues to focus on Proxy IP functionality, leveraging info from this module.

hence the separation and added dependency, mostly due to the extension parsing allowed by spec, as I'm a fan of separation of concerns.

ahmadnassri avatar Mar 27 '16 05:03 ahmadnassri

Oh, I know I keep making comments but I keep thinking of new things to say :)

all good, same here :)

more though from us (and perhaps the technical committee that now oversees this organization) needs to be done around which changes go into which modules, what new modules should be created, etc.

as I was reviewing the original work (almost a year old now!) I started thinking of better way to break this down too, I'm no longer a fan of having the vendor headers included in the core of this module.

I believe now, it should only concern it self with the RFC 7239 Spec and X-Forwarded-* only (meaning we can merge the work from forwarded-parse directly here.)

and the vendor specs can be additional, separate modules, that can be passed to this one for additional processing (e.g. building the stack of IPs in a desired order across multiple specs)

not sure what a good pattern for the latter though, or how it would propagate across higher dependencies for user-level control? (e.g. proxy-addr > connect > framework)

a simple pattern can perhaps just be an array of module names, or directly references functions (that in turn can be loaded from modules) that provide a common object { for, by, host, proto } that can then be used to build a stacked result.

ahmadnassri avatar Mar 27 '16 05:03 ahmadnassri

now that I'm back at a desktop keyboard, here's an example of how my previous thoughts might shape out:

minimal / default
const forwarded = require('forwarded') // included dependencies: RFC2739, X-Forwarded-*
let result = forwarded(request)
verbose
const forwarded = require('forwarded') 
const rfc2739 = require('forwarded-rfc2739')
const xforwarded = require('forwarded-x-forwarded')
const cloudflare = require('forwarded-cloudflare')
const fastly = require('forwarded-fastly')

// custom order
let options = {
  schemas: [cloudflare, rfc7239, fastly, xforwarded]
}

let result = forwarded(request, options)

ahmadnassri avatar Mar 27 '16 16:03 ahmadnassri

I think proxy-addr should be receiving those options, and this module is the one that does the parsing for one of those "schemas": Forwarded.

dougwilson avatar Mar 27 '16 19:03 dougwilson

would proxy-addr also deal with for, by, host, proto properties and data beyond just the "proxy address"?

ahmadnassri avatar Mar 27 '16 19:03 ahmadnassri

There is no reason why it cannot do that, since those are all basically part of the "proxied address". Ideally one should at least get enough information to construct the URL for the request, and just the hostname/IP is not enough information, because it could be on a non-default port, could be using HTTP or HTTPS, and more.

Typically if people just wanted to parse their headers, they would directly depend on the module that just does the header parsing they need. The proxy-addr module was created to glue the pure parsing modules together with trust chains and expose some kind of interface that modules like Express or koa could consume for what they need.

I think the long-term plans is that proxy-addr will be the module that will get all the address information: host, scheme/protocol, port, etc.

It used to also parse things like X-Forwarded-For, but that was doing too much, so it was split out into this module, with the desire to get parsing of the real Forwarded header instead, though that header doesn't seem to have taken off very much (yet), so there hasn't been any requests to Express/Koa to even get alternative headers as part of the core parsing, though it would be good to have.

Because we keep going back and forth on this discussion, I still have not even had time to lok at your new code or even refresh myself on the existing content of this pull request, sorry!

dougwilson avatar Mar 27 '16 20:03 dougwilson

Because we keep going back and forth on this discussion, I still have not even had time to lok at your new code or even refresh myself on the existing content of this pull request, sorry!

no worries, I think your point about proxy-addr is along the same lines of what I'm suggesting

I think we're debating semantics of where the logic sits.

The logic used here can be abstracted to proxy-addr and multiple modules / packages representing different "schems" can be created, then proxy-addr glues them together as you said.

TL;DR

I think we're saying the same thing. :)


I'll let you review the current code first, then we can discuss further.

ahmadnassri avatar Mar 27 '16 20:03 ahmadnassri