FrameworkBenchmarks
FrameworkBenchmarks copied to clipboard
Docker 'Bridge' networks degrade whole-system network throughput and may bottleneck benchmarks
OS (Please include kernel version)
Linux -- Ubuntu 20.04.2 LTS -- 5.11.7-051107-generic
Expected Behavior
Docker network configuration does not significantly degrade / bottleneck network throughput
Actual Behavior
Docker reduces framework throughput by ~20% just by being installed, even when not in use, and may also be bottlenecking the benchmark
Steps to reproduce behavior
- First, benchmark desired framework with
wrkand without docker installed. Note throughput of framework in benchmark, as well as latency - Apt install
docker-ce(which also installsdocker-ce-cliandcontainerd.io) - Benchmark framework as before, not even using docker but with it installed, note reduction in performance
- (optional) Run framework inside docker container of choice, I used
bullseye-slim. Runwrkon host against framework running in the container and note performance is roughly the same as the previous step. Also note this does not change much when specifying--network host - Stop the docker service, then uninstall docker. Restart the system (was necessary for me) then run the benchmark, notice performance is restored
Other details and logs
In my tests, if you run docker network ls and see any Bridge networks, including the default one, system-wide performance is degraded. The default bridge network cannot be removed by any normal means (i.e. docker network rm). If these steps are followed, the default Bridge network can be removed. Run the benchmark after following the steps and notice performance is almost restored to 'non-docker' levels. I still saw a 5% throughput degradation after removing this network but it was much better than otherwise. Note, TechEmpower installs a network called tfb which creates a Bridge network so I am fairly confident this is an issue worth discussing.
The basic reasoning I could find for the reduction in performance is Docker's default network configuration: it includes a Bridge network which enables iptables, which can slow down the whole system, even when docker is not in use. There are other network configurations which supposedly do not suffer from this issue, like using macvlan or ipvlan, although it may be good enough to just use --network host without any bridge networks in existence.
@errantmind This is definitely something worth digging into deeper. @msmith-techempower and I looked at this a while back and we did not find any performance degradation, though it's been some time / updates since and certainly things may have changed. At least, if this is the case, all frameworks should be affected the same.
We may not have time to take a look at this in the next couple of weeks, but feel free to drop more info here if you have it. Benchmarking logs with and without the default bridge would be helpful if you have them. Also curious if you were doing this on a single machine or using a mutli machine set up like we do on our Citrine environment.
Thanks for the report!
I'm doing this on a single machine so that could be a factor. I have tried multiple frameworks, each which experience the degradation in network throughput (req/s) of about 20%, so you may be right in saying all frameworks should be affected the same. However, I think it is worth looking into at some point because of how it might be affecting the top-end frameworks, which are already very close to each other in performance. Without this overhead (if it exists in your multi-machine environment) it may be possible for them to further differentiate. I'm working on a framework myself and am short on time, but after I get it submitted I'll try to submit some detailed logs
Disabling userland proxy may alleviate this overhead.
https://franckpachot.medium.com/high-cpu-usage-in-docker-proxy-with-chatty-database-application-disable-userland-proxy-415ffa064955
Disabling userland proxy may alleviate this overhead.
https://franckpachot.medium.com/high-cpu-usage-in-docker-proxy-with-chatty-database-application-disable-userland-proxy-415ffa064955
yes. try setting '"userland-proxy": false' in your daemon.json (usually at /etc/docker/daemon.json) and restarting docker. the overhead should be nowhere near 20% with this disabled.
At MS we run all the TE benchmarks with --network host for the same reasons.