teleport icon indicating copy to clipboard operation
teleport copied to clipboard

tsh proxy db --tunnel cert expiration callback

Open smallinsky opened this issue 1 year ago • 10 comments

What:

tsh proxy db part of https://github.com/gravitational/webapps.e/issues/299

Add ability to the tsh proxy db command to handle cert expiration gracefully with ability to notify Teleport Connect about database expired cert in order to improve UX.

In case of expired certs the connection ends with vague user message like:

psql postgres://postgres@localhost:51643                                                                                                                           [11:34:07]
psql: error: connection to server at "localhost" (::1), port 51643 failed: Connection refused
        Is the server running on that host and accepting TCP/IP connections?
connection to server at "localhost" (127.0.0.1), port 51643 failed: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

teleport local proxy can monitor and detect if database certs are expired and call a callback function that will allow to notify Teleport Connect about expired certs.

smallinsky avatar Jul 07 '22 13:07 smallinsky

The way I interpret this task is that the proxy itself isn't going to manage its certs in any way but just fire a callback if it detects that they expired.

Is it going to be possible to make a running proxy use a new set of certs? This would let us not interrupt any connections that were already established before the certs have expired.

ravicious avatar Jul 18 '22 15:07 ravicious

Is it going to be possible to make a running proxy use a new set of certs? This would let us not interrupt any connections that were already established before the certs have expired.

@ravicious we had a discussion about whether the local proxy can use a new set of cert. Technically, it's doable. It can just establish the upstream connection to the remote with the new cert while holding the downstream connection. The real problem is most Database connections are "stateful" (packet sequence number, connection id, etc.). So it would be best for the db client to handle the reconnection.

The way I interpret this task is that the proxy itself isn't going to manage its certs in any way but just fire a callback if it detects that they expired.

Yes. A callback function will be added to the local proxy and it will be triggered right when the provided cert expired. Though, since cert is provided from the outside to the local proxy, technically the outside can do this itself. it will be a simple goroutine sleep until the "not valid after" time (maybe plus minus some time offset say 5 minutes before expire). I plan to use this new callback functionality for tsh proxy db as well.

greedy52 avatar Jul 18 '22 16:07 greedy52

@greedy52 Understood, so it sounds like we'll have to continue to restart the proxy after relogin, right?

ravicious avatar Jul 19 '22 09:07 ravicious

@greedy52 Understood, so it sounds like we'll have to continue to restart the proxy after relogin, right?

Yes. That would be the simplest.

greedy52 avatar Jul 19 '22 13:07 greedy52

@ravicious sorry for the delay. I was prototyping this last week but realized a slight problem with this plan.

For tsh proxy db without the --tunnel mode, the certs are provided to the database clients NOT the local proxy. So in this case the local proxy will not know when to do the callback.

So ideally the callback should be launched by the owner of the certs instead of the ALPN local proxy. Do you think this can be handled in teleterm instead of the ALPN local proxy? For example, you can have the callback in the gateway.Config and start a goroutine in gateway.New waiting until cert's valid before time (or when gateway is closed)

Let me know what you think. Thanks!

greedy52 avatar Aug 05 '22 13:08 greedy52

I'm not sure if firing a callback after some time passes would satisfy the UX we're aiming for. We'd at least need to be notified by the proxy that a request was sent to the proxy.

The way we'd like it to work is this:

  1. The user opens up Teleport Connect in the morning and logs in to the cluster.
  2. Connect starts an authenticated tunnel to the database server.
  3. The user launches a GUI client and connects to the tunnel.
  4. The next day, the user opens up the GUI client again and tries to interact with the db.
  5. The request fails because the certs have expired.
  6. Connect sees that the user attempted to connect to the db after the cert has expired. Connect brings focus to its window and shows a modal saying something like "It looks like you were trying to connect to foo but your session has expired. Please log in and try again.".
  7. The user logs in, goes back to the GUI client and continues whatever work they were doing.

If I understand you correctly, you propose that the tsh daemon should create a goroutine that fires a callback just after the cert expires. We could manage that goroutine ourselves but we'd still want to bring focus to Connect only when the user attempts to contact the proxy after the cert expires, not immediately after it expires.


I started to write "In general though, wouldn't it be beneficial for all three methods which set up a proxy (tsh proxy db, tsh proxy db --tunnel and creating a gateway in Connect) to have a way to inform the user about an expired cert?". But soon I realized that for tsh it might be as simple as just printing a message after the cert expires. 😏 For Connect we want to have a slightly different UX though.

ravicious avatar Aug 09 '22 10:08 ravicious

Now that I think about it, we probably wouldn't even need a separate goroutine. If the proxy fired a callback each time a request comes through, tshd could just inspect in that callback if the time has passed (or the cert has expired). I'm not sure how feasible it is for the proxy to do that though.

ravicious avatar Aug 09 '22 10:08 ravicious

@ravicious i was thinking about the extra trigger-when-cert-expired goroutine can set Gateway in a special state say closed where alpn local proxy is closed but other information like database and local port are still intact since the gateway isn't delted.

wouldn't it be beneficial for all three methods which set up a proxy (tsh proxy db, tsh proxy db --tunnel, and creating a gateway in Connect) to have a way to inform the user about an expired cert?

Yes, we want for both tsh and tshd. In fact, tsh proxy app and tsh proxy aws also uses ALPN local proxy so it's more than just DBs. Just can't do the cert expired check by the ALPN local proxy as it doesn't have the cert all the time.

If the proxy fired a callback each time a request comes through, tshd could just inspect in that callback if the time has passed (or the cert has expired.

If request means new connections (e.g PreNewConnection), it can certainly be done and it may not be a bad idea. Probably a PostConnectionClose also works so you can check the error coming back. Though for tsh proxy db, I was thinking about printing some warning say 20 minutes before expiring through a callback, then close tsh when it actually expires so that's why I was thinking about the expire callback approach.

Let me know if you think the connection state callback is the way to go i can quickly draft something for you.

greedy52 avatar Aug 09 '22 13:08 greedy52

If request means new connections (e.g PreNewConnection), it can certainly be done and it may not be a bad idea.

Yeah, sorry I wasn't clear enough, I meant new connections. AFAIR right now once you make a connection through the local proxy, you're able to interact with the db even after the cert expires, aren't you?

i was thinking about the extra trigger-when-cert-expired goroutine can set Gateway in a special state say closed where alpn local proxy is closed but other information like database and local port are still intact since the gateway isn't delted.

Ah, I see now. I think this could work and it'd also eliminate the problem I described above where existing connections would remain intact even after the cert expires. It'd just require substantially more changes in Connect as it obviously doesn't handle gateways this way today – either a gateway is in the gateways map and accepts connections or it doesn't and isn't in the map.

Let me know if you think the connection state callback is the way to go i can quickly draft something for you.

I need more time to think about it. However, I think no matter what we do, we want to be able to react to the user trying to establish a connection after the cert has expired and it seems we won't be able to do it without such callback, so I think it's worth doing it. So yes please, I'll appreciate any drafts though I'll probably won't be able to look into them until next week.

Putting the gateway in a special state and closing the alpn local proxy is a separate thing though, I'll talk with the team about it tomorrow.

ravicious avatar Aug 09 '22 16:08 ravicious

I'm a bit bogged down by some troubles with signing Connect builds for Touch ID so if you'd like to talk more before committing to a solution here just let me know. As I said, it'll take me a few days before I can actually take any draft from you and take it for a spin.

ravicious avatar Aug 09 '22 16:08 ravicious

I thought about it and for Connect the connection state callback seems like the way to go for now. Ultimately this is what we want to implement – focusing on the Connect app window when a user attempts to connect to a proxy which uses an expired cert.

Closing the proxy once the cert expires is reasonable too but it's simply not something the team has planned around and it'd increase the scope substantially. As I mentioned, I'll bring this up today during our design meeting as it seems to me that it's something we'll want to add eventually.

ravicious avatar Aug 10 '22 13:08 ravicious

I confirmed with the team that we don't need to be proactive for now (perform any action as soon as the cert expires) but rather react to the action of the user (for example, attempting to make a new connection after the cert has expired). So for now Connect would need only that callback so that we can see that the user attempted to make a new connection.

ravicious avatar Aug 10 '22 16:08 ravicious

@ravicious here is a draft for the callback and an example using it for tsh proxy db. If it works out for you, I can "productize" this change. I do have a few other works on hand so feel free to steal the draft in your changes too if i couldn't get to it on time.

https://github.com/gravitational/teleport/pull/15398/files

greedy52 avatar Aug 10 '22 20:08 greedy52

Thanks, I'll give it a try! Hopefully early next week.

ravicious avatar Aug 11 '22 09:08 ravicious

closed by #16958

greedy52 avatar Nov 09 '22 15:11 greedy52