sst icon indicating copy to clipboard operation
sst copied to clipboard

Tunnel not working after upgrade to 3.12.3

Open andrewcoelho opened this issue 8 months ago β€’ 46 comments

I updated from 3.9.4 to 3.12.3 which may or may not be related, but since then I can't get the tunnel to work.

// npx sst tunnel --stage production --print-logs

time=2025-04-03T09:46:47.644-04:00 level=INFO msg="checking for pulumi" path="/Users/andrew/Library/Application Support/sst/bin/pulumi"
time=2025-04-03T09:46:48.541-04:00 level=INFO msg="checking for bun" path="/Users/andrew/Library/Application Support/sst/bin/bun"
time=2025-04-03T09:46:49.155-04:00 level=INFO msg="starting tunnel"
time=2025-04-03T09:46:49.155-04:00 level=INFO msg="initializing project" version=3.12.3
time=2025-04-03T09:46:49.156-04:00 level=INFO msg="esbuild building" out=/Users/andrew/Projects/xxx/.sst/platform/sst.config.1743688009156.mjs
time=2025-04-03T09:46:49.166-04:00 level=INFO msg="esbuild built" outfile=/Users/andrew/Projects/xxx/.sst/platform/sst.config.1743688009156.mjs
time=2025-04-03T09:46:49.167-04:00 level=INFO msg="evaluating config"
time=2025-04-03T09:46:49.202-04:00 level=INFO msg="config evaluated"
time=2025-04-03T09:46:49.204-04:00 level=INFO msg="checking platform"
time=2025-04-03T09:46:49.205-04:00 level=INFO msg="checking provider" name=command version=1.0.2 compare=1.0.2
time=2025-04-03T09:46:49.205-04:00 level=INFO msg="loading home"
time=2025-04-03T09:46:50.007-04:00 level=INFO msg="aws credentials found" region=us-east-2 profile=xxx-prod
time=2025-04-03T09:46:50.007-04:00 level=INFO msg="fetching bootstrap"
time=2025-04-03T09:46:50.251-04:00 level=INFO msg="found existing bootstrap" data="{\"version\":5,\"asset\":\"sst-asset-xxx\",\"assetEcrRegistryId\":\"xxx\",\"assetEcrUrl\":\"xxx.dkr.ecr.us-east-2.amazonaws.com/sst-asset\",\"state\":\"sst-state-xxx\",\"appsyncHttp\":\"\",\"appsyncRealtime\":\"\"}"
time=2025-04-03T09:46:50.251-04:00 level=INFO msg="loaded config" app=xxx stage=production
time=2025-04-03T09:46:50.252-04:00 level=INFO msg="INFO getting passphrase app=xxx stage=production"
time=2025-04-03T09:46:50.432-04:00 level=INFO msg="INFO pulling state app=xxx stage=production out=/Users/andrew/Projects/xxx/.sst/pulumi/xxx/.pulumi/stacks/xxx/production.json"
time=2025-04-03T09:46:51.045-04:00 level=INFO msg="starting tunnel" cmd="[sudo -n -E /opt/sst/tunnel tunnel start --subnets 10.0.4.0/22,10.0.12.0/22,10.0.0.0/22,10.0.8.0/22 --host xxx --user ec2-user --print-logs]"
Tunnel

➜  Forwarding ranges
   10.0.4.0/22
   10.0.12.0/22
   10.0.0.0/22
   10.0.8.0/22

Waiting for connections...

time=2025-04-03T09:48:06.371-04:00 level=INFO msg="killing process" pid=2173
time=2025-04-03T09:48:06.372-04:00 level=INFO msg="killing process" pid=2173
time=2025-04-03T09:48:06.372-04:00 level=INFO msg="process killed with term" pid=2173
time=2025-04-03T09:48:06.372-04:00 level=INFO msg="untracked process" pid=2173
time=2025-04-03T09:48:06.372-04:00 level=ERROR msg="failed to send sigterm" pid=2173

As you can see the tunnel seemingly runs for about 2 minutes before the process is killed. During those 2 minutes, trying to connect to my database from npx sst shell --stage production and then psql -h <hostname> gives me a server closed connection error.

Trying to run tunnel again it exits immediately.

I've tried restarting my computer, tried multiple networks that are working fine, and no luck getting this to work.

additional logs from sst diagnostic:

time=2025-04-03T09:31:49.882-04:00 level=INFO msg="tunnel started"
time=2025-04-03T09:33:04.855-04:00 level=ERROR msg="failed to start tunnel" error="dial tcp xxx:22: connect: operation timed out"
time=2025-04-03T09:33:04.857-04:00 level=INFO msg="running command" command="[ifconfig utun69 destroy]"
time=2025-04-03T09:33:04.865-04:00 level=ERROR msg="failed to execute command" command="[ifconfig utun69 destroy]" 

Not sure if relevant, but for

sudo -n -E /opt/sst/tunnel tunnel start --subnets 10.0.4.0/22,10.0.12.0/22,10.0.0.0/22,10.0.8.0/22 --host x.x.x.x --user ec2-user --print-logs

the value printed after --host does not match any of the IP's for EC2 instances that I can see in the console.

my sst.config.ts has

const vpc = new sst.aws.Vpc('Vpc', {
  bastion: true,
  nat: 'ec2',
});

ETA: I renamed my VPC and tried re-deploying. This failed, so I undid the rename on the VPC. This looks like it created some new EC2 instances in my account and the tunnel is now working and outputting an existing EC2 instance IP address after --host.

While the problem is resolved for now, I'm still curious to know why SST was trying to use the wrong IP value for --host after I upgraded to 3.12.3 and re-deployed.

andrewcoelho avatar Apr 03 '25 13:04 andrewcoelho

Same here !

amineelmoussafer avatar Apr 03 '25 20:04 amineelmoussafer

Glad I'm not the only one - this is quite an important issue to be addressed

Liamandrew avatar Apr 04 '25 00:04 Liamandrew

trying to recreate

thdxr avatar Apr 04 '25 18:04 thdxr

did any of you try an sst refresh when it wasn't working? i couldn't recreate this

thdxr avatar Apr 04 '25 19:04 thdxr

Yeah I tried sst refresh a couple times with no luck.

andrewcoelho avatar Apr 04 '25 19:04 andrewcoelho

I can confirm that the tunnel only works for the personal stage. I tried downgrading to previous versions, 3.11.21 and 3.9.4, running sst refresh, without luck. Help would be greatly appreciated πŸ™.

gmathieu avatar Apr 05 '25 17:04 gmathieu

I found a workaround by tunneling to your personal stage. The trick is to deploy your personal stage that points to the same VPC used in the production stage.

Here's how to reproduce:

  1. Grab the production's VPC ID (skip the sub-steps if you already know it):

    1. Update sst.config.ts to expose your production's VPC ID

      export default $config({
        // ...
        async run() {
          const vpc = new sst.aws.Vpc('vpc', { bastion: true,nat: 'ec2' })
          // ...
          return { vpcId: vpc.id }
        }
      }
      
    2. Output the VPD ID by running sst refresh --stage production

  2. Update your sst.config.ts to reuse production's VPC

    export default $config({
      async run() {
        sst.aws.Vpc.get("vpc", "VPC's ID")
        // I recommend removing all other components since we don't want to deploy to production
      }
    }
    
  3. Deploy your personal stage: sst deploy

  4. Undo any changes to sst.config.ts

  5. Tunnel using your personal stage sst tunnel

  6. Run whatever you need on the production stage sst shell --stage production ...

Note: both the production stage and your personal must be deployed in the same AWS account.

gmathieu avatar Apr 05 '25 18:04 gmathieu

Getting the same on a 3.12.3 build.

Thevetat avatar Apr 06 '25 07:04 Thevetat

This is fixed for me after upgrading to 3.12.6 πŸŽ‰

Liamandrew avatar Apr 06 '25 22:04 Liamandrew

Still running into this problem with 3.13.1. I tried the workaround to tunnel on a personal stage and my connection gets tunneled but still seems to timeout without reaching the database. Eventually it did start working. I think there may be 2 issues here

  1. The tunnel on production stage is not resolving the right IP (no idea why). Seems to work on any non production stage if you use VPC.get(...)
  2. I think Aurora may just take a long time to spin up? This was a new stage for me in production and after about an hour or two the connection finally made it through and I could run my migrations

ghardin1314 avatar Apr 07 '25 16:04 ghardin1314

I think I narrowed it down, at least for my case. The issue seems to be with reusing the nat ec2 instance for the bastion. When I remove the nat from the vpc and it redeploys a standalone bastion instance then the tunnel works fine. When I add back the nat, the tunnel seems to not resolve the right IP for either of the nat instances

Image

Image

ghardin1314 avatar Apr 07 '25 16:04 ghardin1314

Not sure if relevant, but for

sudo -n -E /opt/sst/tunnel tunnel start --subnets 10.0.4.0/22,10.0.12.0/22,10.0.0.0/22,10.0.8.0/22 --host x.x.x.x --user ec2-user --print-logs

the value printed after --host does not match any of the IP's for EC2 instances that I can see in the console.

I've seen this too. Tunneling failed due to wrong IP.

I downgraded to 3.10.7 and tested with personal stage. Only after remove and redeploy the IP changed. Having a 3.12.3 deployment and tunneling with a 3.10.7 without redeploying shows the wrong IP.

EDIT: Tried with sst tunnel --stage mystage instead of sst dev tunnel today. No luck. IP is wrong again. Tunneling to personal stage via just sst tunnel resolves a correct IP.

pauljasperdev avatar Apr 08 '25 10:04 pauljasperdev

I tried a ton of things - upgrading, downgrading, re-deploying, managed/nat, etc. None worked, then I restarted my Mac and it worked 🀷

natew avatar Apr 09 '25 00:04 natew

I think I narrowed it down, at least for my case. The issue seems to be with reusing the nat ec2 instance for the bastion. When I remove the nat from the vpc and it redeploys a standalone bastion instance then the tunnel works fine. When I add back the nat, the tunnel seems to not resolve the right IP for either of the nat instances

Image

Image

This worked for me for production stage.

export const vpc = new sst.aws.Vpc('MyVpc', {
  bastion: true,
  nat: 'managed',
  // nat: 'ec2',
});

Changing to nat: 'managed' redeployed a bastion host and I could tunnel to it. After appling schema changes I swiched back.

pauljasperdev avatar Apr 09 '25 10:04 pauljasperdev

I tried a ton of things - upgrading, downgrading, re-deploying, managed/nat, etc. None worked, then I restarted my Mac and it worked 🀷

I also just restarted my Mac and it seems to be working correctly with nat: 'ec2'. Weird

Edit: I spoke to soon. Working fine on my dev stage but not on production

ghardin1314 avatar Apr 09 '25 16:04 ghardin1314

we have an issue that i'm working on a fix for - some of your issues is caused by hanging .server files in the .sst folder

if you delete those it may start to work

thdxr avatar Apr 09 '25 16:04 thdxr

we have an issue that i'm working on a fix for - some of your issues is caused by hanging .server files in the .sst folder

if you delete those it may start to work

@thdxr I am experiencing the same issues and don't even have .server files in the .sst folder bc I did:

rm -rf .sst
sst install
sst tunnel --stage=myStage --print-logs

the printed logs consistently show an attempt to connect to the wrong IP (locally, in CI, ...). it only gets the right IP for personal stage

given that this happens to me also in a CI linux machine (and checked the wrong IPs are consistent...), i doubt it has to specifically with macos

partmor avatar Apr 09 '25 16:04 partmor

Can confirm it works when I switch over to a manage nat instance for my vpc. Switching back to ec2 breaks it again

fvaldes33 avatar Apr 10 '25 22:04 fvaldes33

Can also confirm switching to managed nat sorted this out (after cleaning up a few failed elastic IP attachment issues from the switch)

bigbyte-dom avatar Apr 11 '25 01:04 bigbyte-dom

On "sst": "3.13.9" (on a mac). The removal of .sst worked yesterday but weirdly not today, the change to nat: 'managed' does work as others said. The IP in the sst tunnel logs with nat: 'ec2' doesn't match any EC2 instance public IP.

In the mix, random immediate exits/crashes of sst tunnel like in https://github.com/sst/sst/issues/5008, nothing useful in --print-logs though.

n-batalha avatar Apr 11 '25 17:04 n-batalha

@partmor can you do sst state export --stage=mystage and see if the old IP is in the state file somewhere?

thdxr avatar Apr 11 '25 17:04 thdxr

@thdxr I can confirm the old IP that tunnel is trying to connect to is in the state file still. Seems to be a leftover resource because the actual ec2 nats are also present in the state file. Can send it if that would be helpful

Also any ETA on the fix landing? This is really hanging up our ability to get a few things out the door

ghardin1314 avatar Apr 13 '25 18:04 ghardin1314

To be able to access the tunnel, I managed to connect by following these steps: 1. Remove the .server files in the .sst folder. 2. In your sst.config, add the following line: sst.aws.Vpc.get('vpc', 'your-vpc-id'); 3. Just Run the development environment with: npx sst dev

With this setup, there’s no need to deploy to different stages while waiting for a fix for this issue.

amineelmoussafer avatar Apr 14 '25 15:04 amineelmoussafer

I'm using sst version 3.13.10 this is still an issue, and when the tunnel is running I'm not able to access the resources such as RDS DB

jeremyhicks avatar Apr 16 '25 13:04 jeremyhicks

fwiw, i was able to fix this today by doing npx sst deploy --stage production --target MyVpcName. after running that, the IPs started matching up properly.

jb-chief avatar Apr 16 '25 18:04 jb-chief

Thanks, that didn't work for me. The tunnel will still disconnect. I'm trying to connect to the production database. When the tunnel is running I get this message

connection to server at "10.0.15.185", port 5432 failed: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request

jeremyhicks avatar Apr 16 '25 19:04 jeremyhicks

Deleting the .sst folder and moving to managed fix it for me. Still an issue on 3.13.14. This was how I got it to finally work:

export const vpc = new sst.aws.Vpc('Vpc', { bastion: true, nat: 'managed' });

First, I flipped nat from ec2 to managed and got some errors about resource conflicts. Then I named it to Vpc2 and that made more errors like: The new Subnets are not in the same Vpc as the existing subnet group. Then I renamed it back to Vpc and it worked worked itself out 🀷.

Another interesting change around this in the past few months is that I have to run sst tunnel separately from my dev process. I remember that I didn't need to do that at one time.

mwood23 avatar Apr 18 '25 16:04 mwood23

I had the same issue (only on a deployed stage on prod not on dev while running sst)

I try to do it like that:

  • Start sst tunnel --stage prod --print-logs
  • Start drizzle studio

On Drizzle Studio I get a ECONNRESET and I don't see any IPs forwarded

Image

After some time the tunnel was killed:

time=2025-04-25T07:24:23.461+02:00 level=INFO msg="killing process" pid=35807
time=2025-04-25T07:24:23.462+02:00 level=INFO msg="process killed with term" pid=35807
time=2025-04-25T07:24:23.462+02:00 level=INFO msg="untracked process" pid=35807
time=2025-04-25T07:24:23.462+02:00 level=INFO msg="killing process" pid=35807
time=2025-04-25T07:24:23.462+02:00 level=ERROR msg="failed to send sigterm" pid=35807

Tried the following until it worked:

  1. Remove .server files -> no available in .sst
  2. Remove .sst folder and do sst install-> Same issue
  3. Upgraded from 3.13.14 to 3.13.18 -> Same issue
  4. Changing my VPC from { bastion: true, nat: 'ec2' } to { bastion: true, nat: 'ec2' } to new sst.aws.Vpc('MyVpc', { bastion: false }) to remove EC2 and add it back again -> Same Issue
  5. Changing VPC Bastion from ec2 to managed -> Error while deploying waiting for EC2 NAT Gateway (nat-076890c52b4b5a002) create: unexpected state 'failed', wanted target 'available'. last error: Resource.AlreadyAssociated: Elastic IP address [eipalloc-07ba5cda18b230d4c] is already associated: [email protected] -> WORKS

After step 5 it worked. But I don't want to be in this weird state of wanting to deploy a managed nat gateway (which is too expensive for this case anyway) + facing a deployment error. So I try going back to EC2

  1. Changing it back to ec2 -> Same error as before
  2. Renaming VPC from MyVpc to MyVpc2 -> Error while deploying operation error RDS: ModifyDBSubnetGroup, https response error StatusCode: 400, RequestID: ada5ea40-1b4a-4381-8cf4-eb9f84ac8f11, api error InvalidParameterValue: The new Subnets are not in the same Vpc as the existing subnet group: [email protected]
  3. Going back to original name of VPC -> same error
  4. sst refresh --stage prod -> Maybe its updating the IPs? Not sure where they are in the state exactly -> Now the tunnel is immediately failing
time=2025-04-25T07:44:21.718+02:00 level=INFO msg="starting tunnel" cmd="[sudo -n -E /opt/sst/tunnel tunnel start --subnets 10.0.4.0/22,10.0.12.0/22,10.0.0.0/22,10.0.8.0/22 --host 3.122.53.251 --user ec2-user --print-logs]"
Tunnel

➜  Forwarding ranges
   10.0.4.0/22
   10.0.12.0/22
   10.0.0.0/22
   10.0.8.0/22

Waiting for connections...

time=2025-04-25T07:44:21.814+02:00 level=INFO msg="killing process" pid=41189
time=2025-04-25T07:44:21.814+02:00 level=INFO msg="killing process" pid=41189
time=2025-04-25T07:44:21.814+02:00 level=ERROR msg="failed to send sigterm" pid=41189
time=2025-04-25T07:44:21.814+02:00 level=INFO msg="process killed with term" pid=41189
time=2025-04-25T07:44:21.814+02:00 level=INFO msg="untracked process" pid=41189
  1. Deployed prod again And it works

So maybe it was just a thing of

  • updating SST
  • refreshing the state
  • deploying again

I probably triggered something somehow to update the IPs it should connect to. But now it looks fine (at the moment).

Maybe that helps somebody

AlessandroVol23 avatar Apr 25 '25 05:04 AlessandroVol23

  1. sst refresh --stage prod -> Maybe its updating the IPs? Not sure where they are in the state exactly -> Now the tunnel is immediately failing
  2. Deployed prod again And it works

So maybe it was just a thing of

  • updating SST
  • refreshing the state
  • deploying again

I probably triggered something somehow to update the IPs it should connect to. But now it looks fine (at the moment).

Maybe that helps somebody

Thank you for the detailed steps, I had done them all but it was #9 and #10 that finally did the trick for me

jeremyhicks avatar Apr 26 '25 14:04 jeremyhicks

  1. sst refresh --stage prod -> Maybe its updating the IPs? Not sure where they are in the state exactly -> Now the tunnel is
  2. Deployed prod again And it works

can confirm doing 9 and 10 fixed it for me as well πŸš€

note: before i did 9, i removed .sst folder, but im not sure if that's related

danielfyhr avatar Apr 28 '25 06:04 danielfyhr