gobetween icon indicating copy to clipboard operation
gobetween copied to clipboard

Specify source ip when connecting to backend

Open shantanugadgil opened this issue 8 years ago • 7 comments

Hi, I would like to request for the 'source ip' feature as mentioned here.

https://cbonte.github.io/haproxy-dconv/1.8/configuration.html#5.2-source

My use case is identical as explained in the documentation example: I have connections more than 64k.

I haven't thought through how gobetween could pick up the ip addresses dynamically when connecting to the backend, maybe an IP address range or cidr notation?!?

Regards, Shantanu

shantanugadgil avatar Jan 07 '18 05:01 shantanugadgil

Hi @shantanugadgil Am I understand it right: you need gobetween to do kind of "ip spoofing" as described here https://www.haproxy.com/blog/preserve-source-ip-address-despite-reverse-proxies/ ? Could you provide more details of you use case? Maybe draw some diagram how you servers are structured.

This may or may not work with different network routing configuration. For example, even if gobetween could spoof IP address in IP packets, backend packets may go back to client directly because of routing. So routing should be properly configured as well.

yyyar avatar Jan 13 '18 14:01 yyyar

Hi @yyyar ,

My current need is not anything as advanced as IP spoofing 😄 . It is a need/side effect to the 64K limit due to the way TCP works (src_ip:src_port:dest_ip:dest_port)

I think the following image can illustrate my need/problem.

image

From the above image, what will (could) happen is that when the number of connections goes over 64K, the connection from gobetween to the internal load balancer could start failing. (Of course, if the load balancer exposes more than a single IP, the limit will get hit later, but in multiples of 64K per IP).

As an alternative to relying on the load balancer capabilities, a recommended alternative is to have multiple private IP addresses on the gobetween machine. When gobetween will talk to the backend, it will randomly choose any private IP address to talk to the backend.

This is what the HAProxy's "source" configuration supports.

Thus, if I know that I will not get more than 1 million connections from the world over to my gobetween machine, I can pre-configure 16 private IP addresses (1,000,000 / 65,000) on the gobetween machine, and make gobetween randomly choose one of them while making connections on the inside.

HTH, Regards, Shantanu

P.S. Some more links for reference: https://mrotaru.wordpress.com/2013/10/10/scaling-to-12-million-concurrent-connections-how-migratorydata-did-it/ https://stackoverflow.com/questions/10085705/load-balancer-scalability-and-max-tcp-ports https://serverfault.com/questions/403653/websockets-loadbalancers-and-64k-ports

shantanugadgil avatar Jan 14 '18 08:01 shantanugadgil

@shantanugadgil Thanks for explanation, got it! Technically it can be added relatively easily. And you're right we need to define a proper syntax to fill all use cases.

We can set source IP selection either globally something like:

[servers.sample]
protocol = "tcp"
bind = "0.0.0.0:3000"
source = "random" # or concrete ip, i.e. "1.2.3.4", will be used globally for all backend connections

  [servers.sample.discovery]
  kind = "static"
  static_list = [
      "hostname:8000"
  ] 

Or make it use value form backend "source" tag:

[servers.sample]
protocol = "tcp"
bind = "0.0.0.0:3000"

  [servers.sample.discovery]
  kind = "static"
  static_list = [
      "hostname:8000 source=1.2.3.4",   # or source="random"
      "hostname:8000 source=1.2.3.5",
      "hostname:8000 source=1.2.3.6",
  ] 

Actually there are more options, as in HAProxy:

  • Using the IP to which client connected
  • Using client IP address (spoofing), need toadd transparent proxy support and configure proper routing

When using docker or other discovery types we can grab "source" tag from container label.

@illarion @nickdoikov what do you think?

yyyar avatar Jan 14 '18 14:01 yyyar

@yyyar I could really use the IP spoofing described in https://www.haproxy.com/blog/preserve-source-ip-address-despite-reverse-proxies/ , but I need it for UDP packets.

jcnorris00 avatar Jan 17 '18 22:01 jcnorris00

I think 'roundrobin' may be a better/alternative strategy than 'random' while choosing the ip to connect to the backend.

Of course, it would be awesome if gobetween could also dynamically add/remove ips based on how we plumb/unplumb ips to the vm instance.

Adding/Removing IPs could be an easy possibility on AWS EC2.

A strategy for specifying the private IPs to use could be based on range+cidr notation.

shantanugadgil avatar Jan 22 '18 15:01 shantanugadgil

Hi @yyyar

Re-reading your comments about how to specify the "source" ... and I am confused how/why the source field can be added to the static_list (i.e. backend).

Setting the algorithm for the "source" should be a "sender-side" decision rather a receiver's decision, right?

So the following makes more sense to me:

[servers.sample]
protocol = "tcp"
bind = "0.0.0.0:3000"
source = "roundrobin" 
source_range = "10.100.0.0/16"

OR

[servers.sample]
protocol = "tcp"
bind = "0.0.0.0:3000"
source = "roundrobin" 
source_list = ["10.100.0.1/16", "10.100.0.2/16", "10.100.0.3/16"]

OR

if in the TOML way, you want the "source" as a completely separate sub-section... example:

[servers.sample.source]
method = "roundrobin"
ip_list = ...
ip_range = ...

Regards, Shantanu

shantanugadgil avatar Apr 15 '18 06:04 shantanugadgil

Any updates? 👍

frxncisjoseph avatar Nov 10 '18 11:11 frxncisjoseph