sql-proxy icon indicating copy to clipboard operation
sql-proxy copied to clipboard

Tunneling from PSDB to External MySQL

Open enisoc opened this issue 4 years ago • 2 comments
trafficstars

I have a question about whether sql-proxy could be adapted to also help with another use case we have.

We plan to implement a feature to help users migrate data from an external MySQL instance (e.g. RDS) into PSDB. In PSv1, we did this by directly connecting out from our infrastructure to an address provided by the user, but it had two problems we never fully solved:

  1. We had to ask users to open up their database to connections from the public internet. We could give them a list of our possible source IPs, but this was still a hard sell especially when it came time to connect to their production database. We also couldn't completely guarantee that the set of IPs would never change, since we might need to add more over time.
  2. Making outgoing connections from our infrastructure to an arbitrary, user-provided address created a security risk for us. The user might be able to trick us into connecting to one of our own services that would see the traffic as coming from a host inside our own network. We used network egress policies to block connections to private IPs, but the risk still existed for any VMs or other endpoints in our VPC that also had public IPs.

Could we adapt sql-proxy to make the tunnel usable in the reverse direction as well, to facilitate outgoing connections from PSDB to an endpoint in the user's private network?

I'm imagining something like this:

  1. User runs sql-proxy-client in their private network, which has access to their current production database. They configure it for "inbound" mode and point it at the desired database endpoint.
  2. This agent connects out from the user's network to a PSDB endpoint to establish a tunnel through which TCP connections could be established.
  3. When PSDB needs to connect to the user's database, it actually connects to a PSDB-internal endpoint to request that a particular user-established tunnel be used.

What do you think?

enisoc avatar Jan 14 '21 20:01 enisoc

Hi @enisoc

This looks doable. For that to happen, the sql-proxy-client needs to still connect to PSDB. We need to create a bi-directional stream, maybe use a library like https://github.com/hashicorp/yamux to make it easier and write this functionality. The sql-proxy-client needs to reverse proxy the incoming TCP connection to the production database. Question for this particular function: is the user's production DB TLS protected? If, yes this means additional work needs to be done on the sql-proxy-client side. It has to create a second TLS client for the user's database. Of course, this is not needed if the user's database can be server from the same network where sql-proxy-client is running.

In terms of TLS tunneling, because the client will establish a bi-directional stream to PSDB (sql-proxy-server), it will not matter if the server opens a TCP connection to sql-client-proxy. The yamux library will seamlessly create a TCP connection in the reverse direction.

One thing to note is, PSDB should start a connection to the user's database only when there is such a connection. So we need to know programmatically inside sql-server-proxy that the client already established a link to us, which will allow us to open a reverse proxy tunnel. What does happen if sql-client-proxy didn't establish a connection yet? These are small edge cases, but something worth noting.

Also, right now, there is no warm pool. This means the sql-proxy-client opens connections on-the-fly to the server, and it happens only if the user's MySQL client connects to sql-proxy-client. We want to fix it, though, and create a connection pool ready to use from sql-proxy-client to sql-proxy-server. This is required for the feature you've described.

fatih avatar Jan 20 '21 10:01 fatih

is the user's production DB TLS protected?

Even though the purpose of sql-proxy-client is to sit inside the user's private network, some users will still want to use TLS inside that network. So I think we should plan to support this eventually, but it's probably not a launch-blocker for the PSDB "data import" feature in question.

What does happen if sql-client-proxy didn't establish a connection yet?

I imagine that the Vitess processes we run on behalf of the user will blindly attempt to connect to sql-proxy-server and ask for a given external instance. If there is no incoming sql-proxy-client tunnel yet, sql-proxy-server would just refuse the connection and we would indicate to the user that we were unable to connect because the tunnel is not established, and keep retrying on an interval.

That also raises another question: Do we need to run multiple copies of sql-proxy-server for load-balancing? If so, how will we route the outgoing connection from PSDB to the instance of sql-proxy-server that's connected to the tunnel? Maybe we would just start with one instance (for outgoing purposes) for now?

enisoc avatar Jan 20 '21 17:01 enisoc