shadowsocks-rust icon indicating copy to clipboard operation
shadowsocks-rust copied to clipboard

windows server端经常卡死

Open moqi2011 opened this issue 2 years ago • 16 comments

版本是shadowsocks-v1.14.3.x86_64-pc-windows-gnu.zip

运行一段时间后毫无征兆卡住,客户端连接不上,然后进服端按ctrl+c程序不会退出,出现一些连接失败的日志,然后又恢复正常。

下面是按ctrl+c后输出的日志

2022-07-17T17:41:50.793324300+08:00 ERROR tcp tunnel 192.168.1.3:54210 -> appcloud2.in.zhihu.com:443 connect failed, error: dns resolve appcloud2.in.zhihu.com:443 error: 不知道这样的主机。 (os error 11001)
2022-07-17T17:55:05.862293900+08:00 ERROR tcp tunnel 192.168.1.3:54304 -> p214-acsegateway.icloud.com.cn:443 connect failed, error: dns resolve p214-acsegateway.icloud.com.cn:443 error: 不知道这样的主机。 (os error 11001)
2022-07-17T17:55:05.992211500+08:00 ERROR tcp tunnel 192.168.1.3:54357 -> p214-acsegateway.icloud.com.cn:443 connect failed, error: dns resolve p214-acsegateway.icloud.com.cn:443 error: 不知道这样的主机。 (os error 11001)

moqi2011 avatar Jul 17 '22 10:07 moqi2011

os error 11001

The socket library returned 11001 error code when ssserver was trying to resolve domain names. I am not familiar to Windows, it seems that 11001 is related to Windows' DNS configuration: https://www.remoteutilities.com/support/kb/socket-error-11001-host-not-found/

zonyitoo avatar Jul 17 '22 12:07 zonyitoo

It's normal for a network access error to occur, but why would it cause a denial of service for the entire program? Is there any way for him to ignore this error and continue to serve subsequent requests.

os error 11001

The socket library returned 11001 error code when ssserver was trying to resolve domain names. I am not familiar to Windows, it seems that 11001 is related to Windows' DNS configuration: https://www.remoteutilities.com/support/kb/socket-error-11001-host-not-found/

moqi2011 avatar Jul 17 '22 15:07 moqi2011

Well, in this case, there should be something wrong in your DNS resolver that blocks the trust-dns resolver to resolve domain names. trust-dns resolver has a default 5 seconds timeout, so every connections may have to wait for at least 5 seconds to know that there was something wrong in DNS resolution.

https://github.com/shadowsocks/shadowsocks-rust/blob/f533195d015948268d99f537f89cc2f061a30869/crates/shadowsocks-service/src/dns/mod.rs#L17-L23

You may try to start the ssserver with environment variable SS_SYSTEM_DNS_RESOLVER_FORCE_BUILTIN=1 to use system builtin DNS resolution API and see if you can observe the same problem.

zonyitoo avatar Jul 17 '22 16:07 zonyitoo

This configuration may solve the dns problem. But the root of this problem has not been solved. Other network problems also seem to cause a denial of service throughout the program. For example, in the log below, when I press ctrl+c, the program returns to normal.

2022-07-19T11:53:22.930130600+08:00 WARN  handshake failed, maybe wrong method or key, or under replay attacks. peer: 194.247.178.81:43856, error: invalid tag-in
2022-07-19T13:25:06.097286700+08:00 WARN  handshake failed, maybe wrong method or key, or under replay attacks. peer: 192.241.219.61:43512, error: invalid tag-in
2022-07-19T13:25:06.098302100+08:00 WARN  handshake failed, maybe wrong method or key, or under replay attacks. peer: 192.241.219.80:60240, error: invalid tag-in
2022-07-19T13:25:06.099500100+08:00 WARN  handshake failed, maybe wrong method or key, or under replay attacks. peer: 210.3.15.174:56665, error: invalid tag-in

The root cause seems to be that it cannot handle multiple client requests at the same time, where can I configure the number of threads? @zonyitoo Thanks for your answer

moqi2011 avatar Jul 19 '22 15:07 moqi2011

I don't think it is related to the number of threads, because the whole program is running with multi-coroutines on multi-threads.

https://github.com/shadowsocks/shadowsocks-rust/blob/master/src/monitor/windows.rs#L9

The ctrl-c signal should be captured by this task and kill the program entirely. So what exactly triggered when you pressed ctrl+c? I am not an expert of Windows, I really don't know what was happening.

Did you enabled fast_open? Try to disable it and try again.

zonyitoo avatar Jul 19 '22 15:07 zonyitoo

I don't have it enabledfast_open.Press ctrl+c I also think it should stop the program.But this is not the case, even after returning to normal, sometimes pressing ctrl+c cannot exit the program normally. It takes about three repetitions to exit.

moqi2011 avatar Jul 19 '22 16:07 moqi2011

Interesting. So who consumed the ctrl+c signal? Do you have any ideas?

zonyitoo avatar Jul 19 '22 16:07 zonyitoo

Did the log component cause a deadlock? Because every time I get stuck there is log output when I press ctrl+c.

moqi2011 avatar Jul 19 '22 16:07 moqi2011

Normally the log output should be when something happens, but looking at the time display above is not what happened when I pressed ctrl+c. When the program is stuck, there are no logs except that the port listen is successful, but when I press ctrl+c I lose those logs.

moqi2011 avatar Jul 19 '22 16:07 moqi2011

https://github.com/shadowsocks/shadowsocks-rust/blob/master/Cargo.toml#L58

You could try to compile a binary without any logging facilities. Just remove this "logging" feature and compile.

zonyitoo avatar Jul 19 '22 16:07 zonyitoo

OK, thanks.

moqi2011 avatar Jul 19 '22 16:07 moqi2011

Because the symptoms match, just in case you didn't already know, if you are using the legacy conhost (not the new Windows Terminal or Git Bash), selecting text will suspend the running program, and pressing ^C will unselect and resume execution of the program.

database64128 avatar Jul 19 '22 16:07 database64128

After disabling the log module, it ran stably for 24 hours without any stuck.

moqi2011 avatar Jul 23 '22 06:07 moqi2011

Since you have disabled the log module, there will be nothing output to the console when the program running. So your problem must be related to the console output. Did you check the "legacy conhost" as database64128 mentioned?

zonyitoo avatar Jul 24 '22 18:07 zonyitoo

My system is windows server 2019. I try cmd and PowerShell. After starting the program I switched it to the background. I didn't do anything with the computer before it got stuck.

moqi2011 avatar Jul 25 '22 02:07 moqi2011

disable Quick Edit Mode in console option

dev4u avatar Jul 25 '22 08:07 dev4u

disable Quick Edit Mode in console option

This is the correct answer.

moqi2011 avatar Jan 25 '24 19:01 moqi2011