Divert
Divert copied to clipboard
Performance impact?
Hi!
I'm testing the passthru example with "true" and "8" as parameters (also tried 1) and it works fine. However, copying a file over the network that usually runs at about 90 MB/s slows down to 25 MB/s. CPU load of the passthru program is between 20 and 25%.
Does WinDivert really slow down network traffic that much? Can this be improved? Other solutions, like WinpkFilter, have a much smaller impact (e.g. just 5% CPU load, just 10% drop in transfer rate).
Thanks!
I've now tested this again and indeed the performance impact is huge. On a gigabit LAN, running the passthru example slows the network speed down to about 30%! I'm losing almost 70% of the network speed on a fast machine (Intel i7). The other network packet filter "WinpkFilter" does not show that issue. It lets the traffic pass at full speed with a fraction of the CPU load WinDivert uses...
Is WinDivert really so slow?
Firstly, which version of WinDivert did you use? Some of the older versions have performance problems that have been fixed.
These are my test results for WinDivert1.2.0-rc:
direct : inbound=206Mbps outbound=206Mbps
passthru true 4: inbound=205Mbps outbound=178Mbps
200Mbps = ~25MB/s is less than your 90MB/s, and I have not yet tested anything higher.
There is a performance hit for outbound traffic (206 vs 178Mbps, about ~15%). This is something I was aware of but have never found the exact cause. A possible culprit is the checksum recalculation & this may also explain some of the CPU usage. Unfortunately correct checksums are a requirement of the underlying WFP framework as far as I can tell. WinPkFilter is a lower-level NDIS intermediate driver and probably does not need checksum recalculation for a passthru-type example.
The other thing is that WinDivert has always been a convenience versus performance trade-off. For best performance, you are better off implementing a specialized filtering driver for your application.
I'm using WinDivert v1.18. It's a pity that it slows down gigabit networks, because otherwise it seems really great!
Maybe you can test it on a gigabit LAN to see yourself.
Did you ever find anything to improve the performance? I'm currently experiencing a similar performance drop and high CPU usage. In my case my download speed goes from ~6MB/s to 4.5MB/s with 15% CPU (probably depends on the CPU). The application spends most of its time in the WinDivertSendEx method. I already increased both available parameters (WINDIVERT_PARAM_QUEUE_LEN and WINDIVERT_PARAM_QUEUE_TIME) but that does not make a lot of difference I'm afraid.
Hi!
Sorry, but no. This problem seems to be by design and the developers don’t seem to be interested to fix it.
Regards!
From: Areithus [mailto:[email protected]] Sent: Donnerstag, 22. September 2016 11:05 To: basil00/Divert [email protected] Cc: FCrane [email protected]; Mention [email protected] Subject: Re: [basil00/Divert] Performance impact? (#52)
Did you ever find anything to improve the performance? I'm currently experiencing a similar performance drop and high CPU usage. The application spends most of its time in the WinDivertSendEx method. I already increased both available parameters (WINDIVERT_PARAM_QUEUE_LEN and WINDIVERT_PARAM_QUEUE_TIME) but that does not make a lot of difference I'm afraid.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/basil00/Divert/issues/52#issuecomment-248848671 , or mute the thread https://github.com/notifications/unsubscribe-auth/ANeBiOwx8yOi1e7Ar5LNxNSEjG06SEIMks5qskTVgaJpZM4F307e . https://github.com/notifications/beacon/ANeBiDM_FtwI0hdtO3phhhs7M_-KVsZMks5qskTVgaJpZM4F307e.gif
Hi @FCrane and @basil00 , I'm interested to write a packet filter which will block outbound traffic from almost 162K IPs. For that purpose, instead of diverting all traffic and then filtering I've come with a scheme:
- For TCP divert only SYN packets (dropping SYN packets wouldn't make the connection) and for UDP and other divert all packets (filter: "(outbound and tcp.Syn) or (outbound and (!tcp))" , is filter correct?).
- Catch diverted traffic (with 4-8 threads) and perform filtering using binary search, if any diverted packet is from the blacklisted IPs block it, or reinject otherwise.
Please comment about my scheme, would it work? Additionally, please let me know the tools and all other things you used to measure the performance drop. Help me in creating the environment to measure performance, I want to test my scheme's throughput. Thanks !
Generally you want to divert as little traffic as possible to get the job done. Diverting only SYN packets is a good approach and should have minimal impact, although this will not affect established TCP connections. For UDP, that does not have a SYN equivalent, you'd be stuck with diverting everything or implementing something complex (e.g. update the filter string to whitelist established UDP flows).
This problem seems to be by design and the developers don’t seem to be interested to fix it.
Nothing can be done until #53 is fixed anyway.
@basil00 , Thanks for your reply. Would you please let me know the tools and all other things you used to measure the performance drop (which you and @FCrane measured/tested and mentioned in upper comments). Help me in creating the environment to measure performance, I want to test my scheme's throughput.
For latency use ping, and for throughput just use any file transfer tool (ftp
, scp
, or even there are some http
speed testers if you google for them) will do.
The performance impact of WinDivert is usually minimal unless you are attempting to divert megabytes per second of data through a user application. This is especially true for latency, where the the lag introduced by the user application is usually insignificant compared to the normal network lag. One danger for throughput is if the WinDivert packet queue getting overwhelmed resulting in packet loss.
https://github.com/basil00/Divert/issues/52#issuecomment-137912410
It would be helpful to know if the performance hit is constant with handicapped transfers with varying granularity of handicapping.
The latest WinDivert source code seems to be about ~4x faster than older versions 1.1.X and 1.2.Y, at least with my quick-and-dirty testing. This might not be quite gigabit speeds but at least it is a lot closer. The better performance is mainly due to internal driver optimizations such as avoiding copying packets (where possible) and instant injection.
Nice @basil00! 4x more throughput or 4x less CPU usage? Or both? 👍
@Areithus In my experience CPU is nearly all on the user, for the diversion process CPU usage is nil when using overlapped functions and tracking TCP packet flows. Also I believe, given the context of the thread, it's about throughput.
This is exciting news. I've had word @basil00 that EV cert was granted and in the mail to someone I'm working with so we should be able sign shortly.
Yes I meant 4x throughput, although it was a very rough test. I was testing 1Gbps speed, and version 1.2.0 choked at about 170Mbps, whereas the new version managed 630Mbps (still not perfect but much better). But this is just one quick test.
This is exciting news. I've had word @basil00 that EV cert was granted and in the mail to someone I'm working with so we should be able sign shortly.
Let me know when you are ready and I can assist. My other sponsor signed version 1.3.0 but it was a long and painful process, but we gained much experience. From the project's perspective there is no harm in more than one sponsor :)
What was the CPU and it's load when you tested it? Did you test passthru
?
Yes passthru
. The test box is an old system, so might do better with more modern CPUs.
Some benchmarks for passthru true
at gigabit speeds:
------------------------------------------------------------------
Direct:
##: 0.80 Gbps down, 0.93 Gbps up
------------------------------------------------------------------
WinDivert-1.2.0-rc (#threads)
#1: 0.06 Gbps down, 0.06 Gbps up
#2: 0.12 Gbps down, 0.11 Gbps up
#3: 0.16 Gbps down, 0.15 Gbps up
#4: 0.19 Gbps down, 0.18 Gbps up
------------------------------------------------------------------
WinDivert-1.3.0 (#threads)
#1: 0.41 Gbps down, 0.49 Gbps up
#2: 0.71 Gbps down, 0.81 Gbps up
#3: 0.76 Gbps down, 0.87 Gbps up
#4: 0.73 Gbps down, 0.83 Gbps up
------------------------------------------------------------------
WinDivert-1.4.0-dev (#threads)
#1: 0.39 Gbps down, 0.46 Gbps up
#2: 0.61 Gbps down, 0.75 Gbps up
#3: 0.77 Gbps down, 0.84 Gbps up
#4: 0.74 Gbps down, 0.77 Gbps up
Notes:
- WinDivert-1.2.0 and earlier had a performance bug that limited throughput to around ~200Mbps. Coincidentally, this was my available bandwidth at the time, so the problem went unnoticed.
- The performance bug was fixed here. The fix was included in the WinDivert-1.3.0 release. The disadvantage of the fix is that
WinDivertSend()
will not return an error code if the injection fails (instead the packet will silently disappear). - The performance has slightly regressed in WinDivert-1.4.0 (although for 3 threads it is about the same). I will continue to investigate.
There is no MSVC build for WinDivert 1.3.0?
@kelvinomolumo did you check the releases page?
Nice test @basil00, I did some testing here as well (just a little with reading) and can confirm that 1.3.0 is faster than 1.4.0. Not just the throughput but also CPU usage is a little less (about 1-2% less).
There is no MSVC build for WinDivert 1.3.0?
No, try to link against the MINGW version.
can confirm that 1.3.0 is faster than 1.4.0
Version 1.4.0 has a more complicated pipeline, so is probably a bit slower as a result. The details are somewhat technical, but version 1.3.0 queues packets (by deep copying) at DISPATCH_LEVEL
, which is not ideal (at DISPATCH_LEVEL
the thread is uninterruptible, so nothing else can run until the copying has finished). Version 1.4.0 fixes this by moving the copying and filtering out-of-band and runs at PASSIVE_LEVEL
(is interruptible, just like normal user-mode code), but this requires an extra queue internally, so likely adds some overheads.
A more optimal design (in terms of performance) would be to not to use deep copying for queueing packets at all, but rather keep a reference to the original packet (NET_BUFFER_LIST
). However, drivers are not supposed to keep references to NET_BUFFER_LIST
s for long, such as waiting for a user mode application (as is the case with WinDivert), and Microsoft specifically advise against this.
@basil00 I've been reading up a little on this (I think you refer to this specifically: https://msdn.microsoft.com/en-us/library/windows/hardware/ff551206(v=vs.85).aspx and also on some other pages such as https://msdn.microsoft.com/en-us/library/windows/hardware/ff551134(v=vs.85).aspx). Please correct me if I'm wrong though. It seems that if you listen to IRP_MN_QUERY_POWER you can keep the references. Might be worth looking into, it'd be nice to get near gbit speeds with just 2 threads.
That might be something to look into.
I also remembered that there are other complications to consider. Specifically, while deep copying sounds slow, it also has the benefit of freeing up the original buffer. This means that WSASend
can complete immediately rather than blocking until WinDivert dereferences the packet. This can result in better throughput.
The latest WinDivert-1.4-dev has reverted back to deep copying rather than referencing packets. It appears this mode is actually slightly faster:
#3: 0.80 Gbps down, 0.87 Gbps up
So there is no reason not to continue using this mode for the immediate future. I hope to release version 1.4 shortly.
@basil00 is the WinDivertSend back to how it was working in v1.3 as well with no error if injection fails?
Since version 1.3.0 the WinDivertSend
function will return immediately since this is a lot faster. If you prefer to wait for an error code, it is possible to pass the WINDIVERT_FLAG_DEBUG
flag to WinDivertOpen
and this will emulate the old behavior. Note that, in my experience, Windows often does not return an error if injection fails either way...
I had evaluated the WinDivert 2.0 performance as part of testing, so it is probably worthwhile to make some quick notes here.
One problem was that I was unable to replicate the pervious performance numbers for older versions of WinDivert. It is possible that WinDivert performance took a hit from the Meltdown mitigation, and especially since my test box uses older hardware. I was also unable to replicate the top speeds for the unfiltered connection either, which may be related, or may have been a temporary network issue.
Nevertheless, we can relative evaluate the performance of WinDivert 2.0, and it essentially matches 1.4.3 using the same parameters (i.e., same thread count), which is in line with expectations.
WinDivert 2.0 also introduces "batch mode" using the WinDivert...Ex()
functions. This allows the user application to send/receive multiple packets at one, and significantly reduces the number of kernel/user-mode context switches required. In my experiments, batch mode can significantly improve performance improvement even for single-threaded applications. Using the 2.0 version of passthru
, the following configuration (using a single thread and batch of 32) can run at "full speed" (~0.83Gbps filtered vs ~0.93 unfiltered):
passthru.exe true 1 32
This suggests that "batch mode" is the most important factor in terms of performance improvement in recent versions of WinDivert.
Hi basil,
I am suffering from the performance issue now. Actually, we focus on the SMB performance (aka network share). Test env: Two Windows virtual machines (Win10 and Win7) connected with each other under Parallels Desktop
Without using Windivert, the file copying speed can be 150MByte/s (the virtual network adapter is 10Gbps)
With using Windivert, passthru.exe (both 1.4.3 and 2.0.0 rc), the speed I can get is at most 50MByte/s
The best performance with passthru is with 1 or 2 threads:
passthru true 1
for 1.4.3 passthru true 1 32
for 2.0 rc
Although this is tested under virtual machines, but I got similar results with Physical machines with 1Gbps connection.
Could you please shed some light on this? How can I debug this issue?
Thanks
Just so you know, its a known issue to some of us that SMB and other windows services suffer degraded performance. However I haven't retested using batching.
Personally I exempt such traffic but that may not be an option based on your use case.