go-http-tunnel icon indicating copy to clipboard operation
go-http-tunnel copied to clipboard

Connection closing after aprox. 15 minutes

Open rdelvalle opened this issue 6 years ago • 13 comments

Hello. I've a client that after 15 minutes disconnects. In the client I don't see any log. This is what I see in the server:

2018/03/13 10:17:26 level 2 action open listener identifier UJ72CAB-5LW5RVP-STCEAVM-TSJ2DSV-OM3PPAK-HY3Q5CL-L7Z2NNB-LRGQ7AB addr [::]:121
2018/03/13 10:17:26 level 2 action open listener identifier UJ72CAB-5LW5RVP-STCEAVM-TSJ2DSV-OM3PPAK-HY3Q5CL-L7Z2NNB-LRGQ7AB addr [::]:122
2018/03/13 10:17:26 level 2 action open listener identifier UJ72CAB-5LW5RVP-STCEAVM-TSJ2DSV-OM3PPAK-HY3Q5CL-L7Z2NNB-LRGQ7AB addr [::]:123
2018/03/13 10:17:26 level 2 action open listener identifier UJ72CAB-5LW5RVP-STCEAVM-TSJ2DSV-OM3PPAK-HY3Q5CL-L7Z2NNB-LRGQ7AB addr [::]:124
2018/03/13 10:17:26 level 2 action set registry item identifier UJ72CAB-5LW5RVP-STCEAVM-TSJ2DSV-OM3PPAK-HY3Q5CL-L7Z2NNB-LRGQ7AB
2018/03/13 10:17:26 level 1 action connected addr 80.174.238.86:60705 identifier UJ72CAB-5LW5RVP-STCEAVM-TSJ2DSV-OM3PPAK-HY3Q5CL-L7Z2NNB-LRGQ7AB
2018/03/13 10:18:03 level 2 action proxy conn identifier UJ72CAB-5LW5RVP-STCEAVM-TSJ2DSV-OM3PPAK-HY3Q5CL-L7Z2NNB-LRGQ7AB ctrlMsg &{proxy [::]:121 tcp }
2018/03/13 10:18:04 level 3 action transferred bytes 5089 dir user to client dst UJ72CAB-5LW5RVP-STCEAVM-TSJ2DSV-OM3PPAK-HY3Q5CL-L7Z2NNB-LRGQ7AB src 89.246.69.218:63109
2018/03/13 10:18:04 level 3 action transferred bytes 7910 dir client to user dst 89.246.69.218:63109 src UJ72CAB-5LW5RVP-STCEAVM-TSJ2DSV-OM3PPAK-HY3Q5CL-L7Z2NNB-LRGQ7AB
2018/03/13 10:18:04 level 2 action proxy conn done identifier UJ72CAB-5LW5RVP-STCEAVM-TSJ2DSV-OM3PPAK-HY3Q5CL-L7Z2NNB-LRGQ7AB ctrlMsg &{proxy [::]:121 tcp }
2018/03/13 10:18:31 level 2 action proxy conn identifier UJ72CAB-5LW5RVP-STCEAVM-TSJ2DSV-OM3PPAK-HY3Q5CL-L7Z2NNB-LRGQ7AB ctrlMsg &{proxy [::]:121 tcp }
2018/03/13 10:18:33 level 3 action transferred bytes 5224 dir user to client dst UJ72CAB-5LW5RVP-STCEAVM-TSJ2DSV-OM3PPAK-HY3Q5CL-L7Z2NNB-LRGQ7AB src 89.246.69.218:63113
2018/03/13 10:18:33 level 3 action transferred bytes 8062 dir client to user dst 89.246.69.218:63113 src UJ72CAB-5LW5RVP-STCEAVM-TSJ2DSV-OM3PPAK-HY3Q5CL-L7Z2NNB-LRGQ7AB
2018/03/13 10:18:33 level 2 action proxy conn done identifier UJ72CAB-5LW5RVP-STCEAVM-TSJ2DSV-OM3PPAK-HY3Q5CL-L7Z2NNB-LRGQ7AB ctrlMsg &{proxy [::]:121 tcp }
2018/03/13 10:34:14 level 1 action disconnected identifier UJ72CAB-5LW5RVP-STCEAVM-TSJ2DSV-OM3PPAK-HY3Q5CL-L7Z2NNB-LRGQ7AB
2018/03/13 10:34:14 level 2 action clear registry item identifier UJ72CAB-5LW5RVP-STCEAVM-TSJ2DSV-OM3PPAK-HY3Q5CL-L7Z2NNB-LRGQ7AB
2018/03/13 10:34:14 level 2 action close listener identifier UJ72CAB-5LW5RVP-STCEAVM-TSJ2DSV-OM3PPAK-HY3Q5CL-L7Z2NNB-LRGQ7AB addr [::]:121
2018/03/13 10:34:14 level 2 action close listener identifier UJ72CAB-5LW5RVP-STCEAVM-TSJ2DSV-OM3PPAK-HY3Q5CL-L7Z2NNB-LRGQ7AB addr [::]:122
2018/03/13 10:34:14 level 2 action close listener identifier UJ72CAB-5LW5RVP-STCEAVM-TSJ2DSV-OM3PPAK-HY3Q5CL-L7Z2NNB-LRGQ7AB addr [::]:123
2018/03/13 10:34:14 level 2 action close listener identifier UJ72CAB-5LW5RVP-STCEAVM-TSJ2DSV-OM3PPAK-HY3Q5CL-L7Z2NNB-LRGQ7AB addr [::]:124
2018/03/13 10:34:14 level 2 action listener closed identifier UJ72CAB-5LW5RVP-STCEAVM-TSJ2DSV-OM3PPAK-HY3Q5CL-L7Z2NNB-LRGQ7AB addr [::]:124
2018/03/13 10:34:14 level 2 action listener closed identifier UJ72CAB-5LW5RVP-STCEAVM-TSJ2DSV-OM3PPAK-HY3Q5CL-L7Z2NNB-LRGQ7AB addr [::]:121
2018/03/13 10:34:14 level 2 action listener closed identifier UJ72CAB-5LW5RVP-STCEAVM-TSJ2DSV-OM3PPAK-HY3Q5CL-L7Z2NNB-LRGQ7AB addr [::]:122
2018/03/13 10:34:14 level 2 action listener closed identifier UJ72CAB-5LW5RVP-STCEAVM-TSJ2DSV-OM3PPAK-HY3Q5CL-L7Z2NNB-LRGQ7AB addr [::]:123

I tried with backoff: max_time: 0 but there was no difference.

Thanks!

rdelvalle avatar Mar 13 '18 10:03 rdelvalle

There must be some proxy in between that is cutting the connection.

mmatczuk avatar Mar 13 '18 11:03 mmatczuk

How can I debug that?

rdelvalle avatar Mar 13 '18 11:03 rdelvalle

I think the issue is that the client doesn't know it's disconnected, so it doesn't retry.

rdelvalle avatar Mar 13 '18 13:03 rdelvalle

@mmatczuk Could it be that the keepalive for windows is not working? I'm using a tcp connection on Windows 7. Seems it closes after 15 minutes of no activity.

rdelvalle avatar Mar 13 '18 14:03 rdelvalle

Are you running a release version or you build it yourself?

mmatczuk avatar Mar 13 '18 14:03 mmatczuk

@mmatczuk I'm running the latest release windows_amd64

rdelvalle avatar Mar 13 '18 15:03 rdelvalle

I also tried the x86 version without success.

rdelvalle avatar Mar 14 '18 07:03 rdelvalle

It's about the windows built. I'd suggest moving to linux :)

mmatczuk avatar Mar 14 '18 10:03 mmatczuk

I myself use Linux and MacOS, but unfortunately, the computer where I want to add the tunnel uses Windows and I'll need to deal with it 😢. Do you think the problem is of the KeepAlive that the app is using or it's about the implementation of it?

rdelvalle avatar Mar 14 '18 10:03 rdelvalle

You would need to debug it, from what I see https://github.com/mmatczuk/go-http-tunnel/blob/master/keepalive_windows.go#L12 could be more like https://github.com/gocql/gocql/blob/2e9f2912ba58e37f34af3554659b576eac655760/conn.go#L1130. You can also experiment with the timing https://github.com/mmatczuk/go-http-tunnel/blob/master/keepalive.go#L19.

mmatczuk avatar Mar 14 '18 10:03 mmatczuk

@rdelvalle try my heartbeat branch: https://github.com/tianchaijz/go-http-tunnel

tianchaijz avatar Nov 07 '18 06:11 tianchaijz

@tianchaijz it is cool. I carefully read your code, and I think your implementation is quite good, why not make a pull request to merge back :)

I made such implementation using zeromq years ago, it is quite necessary for such a program.

the heartbeat version can easily pass the weak network test(switch network, close and reconnect network etc)

@mmatczuk I just learn go recently, really happy to see such a wonderful/high quality open source projects.

Thanks 👍

ice6 avatar Jan 23 '19 02:01 ice6

@tianchaijz It would be awesome if you could PR your heartbeat work back to this project.

We're seeing this issue running tunneld in AWS, with clients connecting through an NLB. NLB disconnects clients that don't send any data for 350 seconds (source), so our clients get disconnected and have to be restarted if no requests are routed to them in that timeframe.

apottere avatar Apr 08 '20 15:04 apottere