3850 Improve speed
I have found out that when doing http transfer on 3850s (probably 3650s), it takes forever (denali and up) - mine takes almost 6hours. Increasing tcp windows size, helps tremendously:
ip tcp window-size 1073741823
ip tcp path-mtu-discovery
I would try a few window sizes, this is the maximum window size allowed (maybe try/expect and decrease it if there's an IO error - I tried it on a 2960X and it failed a few times, I'll need to do further testing)
Below is an example of the file transfer across the atlantic, only 291.601 seconds, before this it was 5+ hours.
-CSWT02#copy http://10.95.238.50/cat3k_caa-universalk9.16.08.01a.SPA.bin flash: Destination filename [cat3k_caa-universalk9.16.08.01a.SPA.bin]? Accessing http://10.95.238.50/cat3k_caa-universalk9.16.08.01a.SPA.bin... Loading http://10.95.238.50/cat3k_caa-universalk9.16.08.01a.SPA.binbytes copied in 291.601 secs (1460654 bytes/sec)
Thanks for the feedback.
Does ip tcp window-size apply to management traffic, or all traffic?
As far as I know it's only affecting traffic destined to the switch itself.
TCP window size is ideally set to (link bitrate x link latency). Setting a window size too high can cause memory issues, and packet loss will cause more delays.
I can programmatically determine latency with a ping test, but link bandwidth is tricky. I can't infer line rate of attached interface.
Will you run some tests and see what works well for your bandwidth and latency?
In the past, adjusting TCP windows size hasn't consistently worked. This is in the context of SCP transfers using Netmiko.
Some related discussion here:
https://github.com/ktbyers/netmiko/issues/491
I also recently did an upgrade of a couple of IOS-XE devices from 16.08.01 to 16.12.03 using Ansible's net_put and the TCP window size did not make a performance difference (IIRC). It was taking over 3 hours to transfer a 542MB file to the device.
ip ssh bulk-mode in IOS-XE 17 looks interesting (this is from @edurguti )
It would be interesting to get more data on some of these performance problems and workarounds as the default transfer speeds are crazy slow...
@ktbyers My testing on the cat3k series suggests the limit is the control plane CPU. Even plain FTP pegs the CPU at <500Kbps, only a marginal improvement over TFTP.
I looked for a way to force a weaker (and therefore less CPU-intensive) cipher suite for SCP, but didn't turn up anything.
Hi, I just tested again with http:
ip tcp window-size 1073741823 ip tcp path-mtu-discovery
NASGH3850-10-121-CSWT02#copy http://10.95.238.50/cat3k_caa-universalk9.16.09.05.SPA.bin flash: Destination filename [cat3k_caa-universalk9.16.09.05.SPA.bin]? Accessing http://10.95.238.50/cat3k_caa-universalk9.16.09.05.SPA.bin... Loading http://10.95.238.50/cat3k_caa-universalk9.16.09.05.SPA.binbytes copied in 1268.110 secs (355082 bytes/sec)
NASGH3850-10-121-CSWT02#dir flash: | i 16.09 64643 -rw- 450283034 Jun 16 2020 07:30:46 -05:00 cat3k_caa-universalk9.16.09.05.SPA.bin
I have another one running without adjusting the tcp window size and it's still going Below is the latency:
NASGH3850-10-121-CSWT02#ping 10.95.238.50 source vlan 101 Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to 10.95.238.50, timeout is 2 seconds: Packet sent with a source address of 10.89.228.3 !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max = 70/72/80 ms
@edurguti I guess the broader question is--is there any pattern when we can expect changing the TCP window size will work versus when it won't (as it seems to work in some cases, but not in others)?
And I don't really have a good explanation as to why except for some vague "maybe something platform or OS-version specific".
Let me see if I can test on my ISR2 router(s) and see whether your exact change works there or not.
I never had it fail on 3850/3650, but on 2960x it does fail.
Here's how long it took on the one where I didn't change the windows size
NASGH3850-10-120-CSW(config)#end NASGH3850-10-120-CSWT01#copy http://10.95.238.50/cat3k_caa-universalk9.16.09.05.SPA.bin flash: Destination filename [cat3k_caa-universalk9.16.09.05.SPA.bin]? Accessing http://10.95.238.50/cat3k_caa-universalk9.16.09.05.SPA.bin... Loading http://10.95.238.50/cat3k_caa-universalk9.16.09.05.SPA.binbytes copied in **10066.090** secs (44733 bytes/sec)
@ktbyers @edurguti can either of you post the output of sh proc cpu sort during the transfer?
I took a few while transferring:
NASGH3850-10-121-CSWT02#show processes cpu Core 0: CPU utilization for five seconds: 6%; one minute: 16%; five minutes: 10% Core 1: CPU utilization for five seconds: 0%; one minute: 7%; five minutes: 3% Core 2: CPU utilization for five seconds: 1%; one minute: 4%; five minutes: 2% Core 3: CPU utilization for five seconds: 1%; one minute: 2%; five minutes: 2% Core 4: CPU utilization for five seconds: 0%; one minute: 3%; five minutes: 2% Core 5: CPU utilization for five seconds: 0%; one minute: 1%; five minutes: 1%
On Tue, Jun 16, 2020 at 5:56 PM Austin de Coup-Crank < [email protected]> wrote:
@ktbyers https://github.com/ktbyers @edurguti https://github.com/edurguti can either of you post the output of sh proc cpu sort during the transfer?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/austind/iosfw/issues/2#issuecomment-644853576, or unsubscribe https://github.com/notifications/unsubscribe-auth/AETQIF2KQXVZW4NIFTILMIDRW6IZDANCNFSM4H4IOZRA .