Good article on `x-forwarded-for` parsing
https://adam-p.ca/blog/2022/03/x-forwarded-for/
I’m going to be looking at this in substantially more depth, but this is probably worth looking at for this library as well to see if it’s worth adding references to the documentation.
What a fantastic article! It speaks to my soul. :sob: I definitely want to link it in the docs, at least. Thanks so much for forwarding it! And thanks @adam-p for writing it. ❤️
It also gives me some food for thought. One thing the post calls out is
A default list of places to look for the client IP makes no sense
which is something I was feeling the last time I was doing significant work on this library: https://github.com/ajvondrak/remote_ip/blob/4826fc437976d7e8dca1ac0ed1ebc7459c6b1a33/lib/remote_ip/options.ex#L33-L63
There's definitely an upgrade path we could follow to get people to specify their headers explicitly. Plus, maybe replace the :headers option with :header, because multiple headers get weird. Strictly speaking, you could plug RemoteIp multiple times with different :header values, which has the benefit of at least setting a precedence order. 🤔 Will have to think on it.
They also opened the issue at https://github.com/sebest/xff/issues/15 pointing out that their list of private IPs is incomplete, which is strictly true for remote_ip as well (the list originally came from the Rails implementation). Migrating that might be more of a pain, though.
You're welcome! I wanted the post to be useful to people just like you, so I'm super happy you found it.
This is among the best libraries I've seen. I really liked your algorithm document.
You've already found the post, so I'll only add a couple of things...
Generic comma-separated headers like
X-Forwarded-For,X-Real-Ip, andX-Client-Ip
Do you treat the latter two like a comma-separated list? (I think you do, but I have never tried to read Elixir.) They shouldn't be, but I guess there's no big downside. If the user has configured the lib to use one of those headers then they shouldn't be spoofable (a trusted reverse proxy set it) and treating it like a list-of-one is fine, and if the header is spoofable, then treating it like a list is the least of the user's problems. (It'd be nice to give an error or something to indicate to the user that something is wrong, but... again, if it's spoofable then it probably won't be spoofed with comma-separated values. Why would an attacker bother doing that?)
After writing the post I created a "reference implementation" in Go that you might be interested in: https://github.com/realclientip/realclientip-go I would really like to be able to point to implementations in other languages, so if yours ends up similar I'd be happy to add a stub realclientip-elixir repo to the organization that points to yours. (I mean, you'd also be welcome to move your repo under the org, but I doubt you will want to.)
@ajvondrak @adam-p I don’t have time to look this over now, but I’d be happy to provide an analysis of both codebases to see where they differ.
@ajvondrak, the short version is that remote_ip implements RightmostNonPrivateStrategy (more or less), but treats both the list headers and the single IP headers as equivalent.
What follows is mostly pulled from @adam-p's documentation and the implementation. I believe that it would be possible to do this with multiple plugs (for the use of single strategies). I have some thoughts about how this could be implemented, and it would depend on the ChainStrategy by default.
Before I get into my thoughts, let me cover the strategies in realclientip.
-
All strategies are configured for a single header, except
ChainStrategy, which is configured with a list of strategies. -
SingleIPHeaderStrategyderives an IP address from a single-IP header, such asX-Real-IP,CF-Connecting-IP,True-Client-IP,Fastly-Client-IP,X-Azure-ClientIP, andX-Azure-SocketIP.-
This strategy should be used when the given header is added by a trusted reverse proxy.
-
You must ensure that this header is not spoofable (as is possible with Akamai's use of True-Client-IP, Fastly's default use of Fastly-Client-IP, and Azure's X-Azure-ClientIP). See the single-IP wiki page for more info.
-
SingleIPHeaderStrategygrabs the last known value of the specified header from the headers list. This is a reasonable default as RFC 2616 does not allow multiple instances of single-IP headers (or any non-list header). It is debatable whether it is better to treat multiple such headers as an error (more correct) or simply pick one of them (more flexible). As we've already told the user tom make sure the header is not spoofable, we're going to use the last header instance if there are multiple. (Using the last is arbitrary, but in theory it should be the newest value.)
-
-
LeftmostNonPrivateStrategyderives the client IP from the leftmost valid and non-private IP address in theX-Forwarded-FororForwardedheader.-
This strategy should be used when a valid, non-private IP closest to the client is desired. Note that this MUST NOT BE USED FOR SECURITY PURPOSES. This IP can be TRIVIALLY SPOOFED.
-
The provided header name must be
X-Forwarded-FororForwarded. -
It combines all good IP addresses from all instances of the header into a single list and then loops through them in top-down, left-right order, returning the first non-private IP address.
-
-
RightmostNonPrivateStrategyderives the client IP from the rightmost valid, non-private/non-internal IP address in theX-Fowarded-FororForwardedheader.-
This strategy should be used when all reverse proxies between the internet and the server have private-space IP addresses.
-
The provided header name must be
X-Forwarded-FororForwarded. -
It combines all good IP addresses from all instances of the header into a single list and then loops through them in bottom-up, right-left order, returning the first non-private IP address.
-
-
RightmostTrustedCountStrategyderives the client IP from the valid IP address added by the first trusted reverse proxy to theX-Forwarded-FororForwardedheader.-
This Strategy should be used when there is a fixed number of trusted reverse proxies that are appending IP addresses to the header.
-
The provided header name must be
X-Forwarded-FororForwarded. -
The count must be a positive integer.
-
It combines all good IP addresses from all instances of the header into a single list. The entry returned is
(len(addresses) - 1) - (count - 1), or an empty value if there are fewer IPs than required or if the proxy didn't add a valid IP address. -
In Elixir terms, we would actually grab
at(reverse(addressess), count)(it may be(count + 1)).
-
-
RightmostTrustedRangeStrategyderives the client IP from the rightmost valid IP address in theX-Forwarded-FororForwardedheader which is not in a set of trusted IP ranges.-
This strategy should be used when the IP ranges of the reverse proxies between the internet and the server are known.
-
If a third-party WAF, CDN, etc., is used, you SHOULD use a method of verifying its access to your origin that is stronger than checking its IP address (e.g., using authenticated pulls). Failure to do so can result in scenarios like:
- You use AWS CloudFront in front of a server you host elsewhere. An attacker creates a CF distribution that points at your origin server. The attacker uses Lambda@Edge to spoof the Host and
X-Forwarded-Forheaders. Now your "trusted" reverse proxy is no longer trustworthy.
- You use AWS CloudFront in front of a server you host elsewhere. An attacker creates a CF distribution that points at your origin server. The attacker uses Lambda@Edge to spoof the Host and
-
The provided header name must be
X-Forwarded-FororForwarded. -
The trusted ranges must contain all trusted reverse proxies on the path to this server. The ranges can be private/internal or external (for example, if a third-party reverse proxy is used).
-
It combines all good IP addresses from all instances of the header into a single list and then loops through them in bottom-up, right-left order. Trusted IPs in that order are skipped and the first entry is returned.
-
-
ChainStrategyattempts to use the given strategies in order. If the first one returns an empty string, the second one is tried, and so on, until a good IP is found or the strategies are exhausted.- A common use for this is if a server is both directly connected to the internet and expecting a header to check.
Here’s my thoughts on how this could work in Elixir:
-
The strategies could be implemented similar to how they are implemented now, but would be deferred to strategy implementation modules (possibly using a
@behaviouras is done for the parsers). -
The implementations would in be something like
RemoteIp.Strategies.SingleIPHeaderStrategy,RemoteIp.Strategies.RightmostNonPrivateStrategy, etc. -
I would not implement
RemoteIp.Strategies.ChainStrategy, but instead would implement the plug as the wrapper to theChainStrategy. The configuration could be one of:-
A strategy-configuration tuple:
plug RemoteIp, {:rightmost_non_private, header: "x-forwarded-for"} -
A keyword list where the only key is
header(implies:rightmost_non_private):plug RemoteIp, header: "x-forwarded-for" -
A list of strategy-configuration tuples:
[ {:single_ip_header, header: "x-real-ip"}, {:rightmost_trusted_count, header: "forwarded", count: 3}, {:rightmost_non_private, header: "x-forwarded-for"}, ] -
The strategy configuration tuples could also be strategy modules (allowing custom extension with a behaviour):
[ {MyApp.MyStrategy, header: "forwarded"}, {RemoteIp.Strategies.SingleIPHeaderStrategy, header: "x-real-ip"}, {RemoteIp.Strategies.RightmostTrustedCountStrategy, header: "forwarded", count: 3}, {RemoteIp.Strategies.RightmostNonPrivateStrategy, header: "x-forwarded-for"}, ]
-
-
The implementation should probably be optimized to avoid parsing the wanted headers multiple times.
This is extremely doable.
Sorry for the tangent, but when you get a moment could you explain this to me:
You usually can't rely on servers to preserve the relative ordering of headers in the HTTP request. For example, the Cowboy server presently uses maps to represent headers, which don't preserve key order. The order in which we process IPs matters because we take that as the routing information for the request. So if you have multiple competing headers, the routing might be ambiguous, and you could get bad results.
The order of multiple XFF or Forwarded headers obviously matters (and surely doesn't ever get reordered, right?), but how does the order of other headers matter?
(An answer I can see being possible is: Instead of looking through a list of headers names in order of preference, you go through the actual headers and take the first that matches something in the preference list. Is that happening?)
That would need to be something that would be raised with the makers of Cowboy or other HTTP servers, unfortunately. It appears that Erlang maps are sorted by key, meaning that they are stable, but unordered.
1> maps:put(a, 2, #{z => 1}).
#{a => 2,z => 1}
Keyword lists would be better (those are tuple lists), but would end up being a fairly major API change for Cowboy &c. But that’s only if the order of the headers matters. According to Cowboy documentation, RFC7230 3.2.2 indicates that the order of headers with different names does not matter.
So, I would bet that Cowboy (but I can’t find the code offhand):
- parses the headers in order received;
- puts new headers in the map; and
- appends multiple instances of the same header to the header list.
It might have some optimizations for headers that are only supposed to appear once (e.g., Host) such that either the first value or the last value is used or that an exception is raised, but I’m not sure.
(An answer I can see being possible is: Instead of looking through a list of headers names in order of preference, you go through the actual headers and take the first that matches something in the preference list. Is that happening?)
@adam-p Sort of. It's true that the :headers configuration is not in order of preference. Rather, every header it names is considered.
Elixir's Plug.Conn struct represents req_headers as an ordered list of tuples. So let's say it looks something like:
[
{"accept", "*/*"},
{"x-forwarded-for", "1.1.1.1, 2.2.2.2"},
{"host", "localhost:4000"},
{"x-client-ip", "3.3.3.3"},
{"user-agent", "curl/7.77.0"},
]
We take this list and select just the headers whose names appear in the :headers configuration. The order of the req_headers is preserved irrespective of the order of the :headers configuration. So you could have
plug RemoteIp, headers: ["x-forwarded-for", "x-client-ip"]
or equivalently
plug RemoteIp, headers: ["x-client-ip", "x-forwarded-for"]
Either way, we'd whittle the above req_headers down to
[
{"x-forwarded-for", "1.1.1.1, 2.2.2.2"},
{"x-client-ip", "3.3.3.3"},
]
Each header is parsed for a list of IPs, then those lists are concatenated together, still respecting their relative ordering. Thus:
[
{1, 1, 1, 1},
{2, 2, 2, 2},
{3, 3, 3, 3},
]
So, this list is taken to represent a network path like:
- Client
1.1.1.1sends request to Proxy A - Proxy A with IP
2.2.2.2setsX-Forwarded-For: 1.1.1.1and forwards to Proxy B - Proxy B with IP
3.3.3.3setsX-Forwarded-For: 2.2.2.2and forwards to Proxy C - Proxy C (whose IP isn't in the headers) sets
X-Client-IP: 3.3.3.3and forwards to Server - Server receives request with the
req_headersfrom above
However, as I was trying to convey in the docs, there are several things "off" about this. Really, it's based on the faulty premise that the order of Elixir's req_headers is sacred. As @halostatue illuminates:
- Cowboy (the server most everyone is using) actually parses the headers into an Erlang map (which doesn't guarantee key order) before passing them on to Elixir's Plug, which converts the map into the list we see.
- Moreover, the relative ordering of different headers doesn't even matter on the protocol level! :sweat_smile: I hadn't even realized this before, so thanks for pointing that out, @halostatue.
Then there's also the matter of how multiple headers with the same name get collapsed into comma-separated values. Suppose we had
- Proxy A sets
X-Forwarded-For: 1.1.1.1 - Proxy B sets
X-Client-IP: 2.2.2.2 - Proxy C adds
X-Forwarded-For: 3.3.3.3, which gets concatenated with the existing header
This could come in to the req_headers as
[
{"x-forwarded-for", "1.1.1.1, 3.3.3.3"},
{"x-client-ip", "2.2.2.2"},
]
or
[
{"x-client-ip", "2.2.2.2"},
{"x-forwarded-for", "1.1.1.1, 3.3.3.3"},
]
Neither order gives us a list of parsed IPs that goes 1.1.1.1 -> 2.2.2.2 -> 3.3.3.3. I guess if multiple headers are getting mixed, it's up to the intermediate proxies to be pretty smart about how they're handled. :confused:
Ultimately, if we can't rely on the relative ordering of separate headers, the :headers plural option doesn't make much sense as it is. Now that I think of it, I also recall being wary of treating it as a priority-ordered list because I didn't want the default to impose one header's precedence over another's. E.g., who am I to say if X-Real-IP should be preferred over X-Client-IP?
I guess users could set their own priority after a fashion by calling the plugin multiple times:
plug RemoteIp, headers: ["x-lo-priority"]
plug RemoteIp, headers: ["x-hi-priority"]
- If neither header is set, neither call has an effect on the
remote_ip - If only
x-lo-priorityis set, then the first call clobbers theremote_ipand the second call has no effect - If only
x-hi-priorityis set, then the first call has no effect and the second call clobbersremote_ip - If both headers are set, the first call clobbers the
remote_ipwithx-lo-priority, but the second call clobbersremote_ipall over again withx-hi-priority
Although interactions might get weird here when skipping proxy IPs. Especially if the two plugins are configured with different values for their :proxies + :clients. :thinking:
Do you treat the latter two like a comma-separated list? (I think you do, but I have never tried to read Elixir.) They shouldn't be, but I guess there's no big downside. If the user has configured the lib to use one of those headers then they shouldn't be spoofable (a trusted reverse proxy set it) and treating it like a list-of-one is fine, and if the header is spoofable, then treating it like a list is the least of the user's problems. (It'd be nice to give an error or something to indicate to the user that something is wrong, but... again, if it's spoofable then it probably won't be spoofed with comma-separated values. Why would an attacker bother doing that?)
@adam-p Yes, and I followed pretty much the same line of reasoning, lol. However, now that v1.0 has custom parsers, it'd be pretty easy to make one that expects just a single IP and use that as the default for x-client-ip and x-real-ip. Or even make ReomteIp.Parsers.Generic the single-IP parser and set up RemoteIp.Parsers.XForwardedFor as the comma-separated one (but that wouldn't be backwards compatible).
So many ideas flowing from this thread! 🤩
@halostatue Thanks for the thorough breakdown in https://github.com/ajvondrak/remote_ip/issues/29#issuecomment-1094506433 :purple_heart:
It'd be interesting to offer all these different strategies. This library was always pretty opinionated about the algorithm, although everything else around it has become super configurable. So I guess making the algorithm configurable is kind of the next frontier, lol. Definitely going to be thinking on this. Want to get my thoughts organized enough to start spawning some follow-up tickets.
(I revisited your algorithm document just now and I see that the answer to my question was there. Sorry!)
@ajvondrak IMO, putting the values from different headers into a single ordered list doesn't seem right. Even ignoring the random header ordering. That's just not what the different headers mean -- they're not meant to be a single order list altogether.
However...
If the user has configured a single header (which really is what they should do), then it's okay.
If you're using a strictly rightmost algorithm AND the user's reverse proxies are replacing any headers that are configured to be looked at AND XFF is being appended properly AND X-Real-IP (etc.) is being set properly AND there's only one reverse proxy, then it's okay.
For example:
X-Forwarded-For: 1.1.1.1, 2.2.2.2
X-Real-IP: 2.2.2.2
You get one of these two lists:
1.1.1.1, 2.2.2.2, 2.2.2.2
2.2.2.2, 1.1.1.1, 2.2.2.2
In either case, the rightmost is the same.
Now let's say the server is behind Cloudflare and one more internal reverse proxy:
X-Forwarded-For: 1.1.1.1, 2.2.2.2, 103.21.244.1 (this is Cf), 10.0.0.1
Cf-Connecting-IP: 2.2.2.2
==>
1.1.1.1, 2.2.2.2, 103.21.244.1, 10.0.0.1, 2.2.2.2
2.2.2.2, 1.1.1.1, 2.2.2.2, 103.21.244.1, 10.0.0.1
Same result in either case: RightmostTrustedRangeStrategy
Different result between cases: Strict rightmost, LeftmostNonPrivateStrategy, RightmostTrustedCountStrategy, RightmostNonPrivateStrategy
(Of course, some of those aren't correct in this case no matter what. But the point is that they'll randomly give different answers.)
So... the current approach is okay for most cases, but a) it feels weird to me, and b) it falls apart if you start handling different algorithms/strategies (and maybe if the reverse proxy configuration gets complex).