python
python copied to clipboard
Faster Python, beyond semantic interposition
#501 has a useful suggestion for speeding up Python by ~20%. After that's done, it's actually possible to do better.
Host is Fedora 33. All tests were run with Python 3.9.
On host:
- Fedora's Python gives 200K pystone/sec.
- Conda-Forge Python gives 240K pystone/sec.
Running inside Docker 20.04 (cgroups v2 enabled):
- fedora:33 gives 173K pystone/sec.
- python:3.9-slim-buster, I get 169K pystone/sec.
- ubuntu:20.04 (no shared library): 183K pystone/sec.
- continuumio/miniconda3 with Python from Conda-Forge: 189K/sec
I am mystified why things are so much slower inside Docker. Some of this is clearly not because of the image, but the runtime. But notice the Ubuntu image is definitely faster.
With podman:
- python:3.9-slim-buster: 204K/sec
- continuumio/miniconda3 with Python from Conda-Forge: 230K/sec
Note that the Anaconda (default Conda) Python 3.9 does not appear faster, it's specifically whatever Conda-Forge does. I am trying to figure that out.
I ran some more benchmarks; same basic results though, this image is the slowest: https://pythonspeed.com/articles/faster-python/
@itamarst
Hi, I found your page very useful. I am actually a nodejs dev, but I am currently optimizing our python docker images. We use python 3.7
Should we create a custom ubuntu + python 3.7 with semantic interposition and lto for maximum performance? Our python services are anyway fucking huge (3-5 GBy, don't ask ;)) and are computational heavy. Every percent more performance is recognizable.
@itamarst
The performance hit in docker comes imho from seccomp
https://stackoverflow.com/questions/60840320/docker-50-performance-hit-on-cpu-intensive-code
Yeah deactivating seccomp results in a massive speed boost. BUt i guess it is not the idea to deactivate seccomp ;)
I read an article, that in linux 5.11 seccomp got optimized reducing some lookup overhead. https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.11-SECCOMP-Performance
So it is also relevant in your performance tests, on which system you run your tests.
@itamarst
I extra upgraded my machine to Linux 5.11. The seccomp performance hit does not change
aras@workstation-111:~/Workspace/python-build-benchmarks$ docker run python-performance
Requirement already satisfied: pyperformance in /usr/local/lib/python3.7/site-packages (1.0.1)
Requirement already satisfied: pyperf in /usr/local/lib/python3.7/site-packages (from pyperformance) (2.2.0)
Python benchmark suite 1.0.1
[1/3] 2to3...
INFO:root:Running `/venv/cpython3.7-51e257070d4f/bin/python -u /venv/cpython3.7-51e257070d4f/lib/python3.7/site-packages/pyperformance/benchmarks/bm_2to3.py --fast --output /tmp/tmp3_xdd1vn`
...........
2to3: Mean +- std dev: 464 ms +- 16 ms
[2/3] django_template...
INFO:root:Running `/venv/cpython3.7-51e257070d4f/bin/python -u /venv/cpython3.7-51e257070d4f/lib/python3.7/site-packages/pyperformance/benchmarks/bm_django_template.py --fast --output /tmp/tmpkx4f5ssh`
...........
django_template: Mean +- std dev: 82.7 ms +- 2.8 ms
[3/3] unpickle_pure_python...
INFO:root:Running `/venv/cpython3.7-51e257070d4f/bin/python -u /venv/cpython3.7-51e257070d4f/lib/python3.7/site-packages/pyperformance/benchmarks/bm_pickle.py --pure-python unpickle --fast --output /tmp/tmp2_9nr_l8`
...........
unpickle_pure_python: Mean +- std dev: 515 us +- 16 us
Performance version: 1.0.1
Report on Linux-5.11.0-13-generic-x86_64-with-debian-10.9
Number of logical CPUs: 8
Start date: 2021-04-11 19:25:56.153610
End date: 2021-04-11 19:26:27.039505
### 2to3 ###
Mean +- std dev: 464 ms +- 16 ms
### django_template ###
Mean +- std dev: 82.7 ms +- 2.8 ms
### unpickle_pure_python ###
Mean +- std dev: 515 us +- 16 us
aras@workstation-111:~/Workspace/python-build-benchmarks$ docker run --security-opt seccomp=unconfined python-performance
Requirement already satisfied: pyperformance in /usr/local/lib/python3.7/site-packages (1.0.1)
Requirement already satisfied: pyperf in /usr/local/lib/python3.7/site-packages (from pyperformance) (2.2.0)
Python benchmark suite 1.0.1
[1/3] 2to3...
INFO:root:Running `/venv/cpython3.7-51e257070d4f/bin/python -u /venv/cpython3.7-51e257070d4f/lib/python3.7/site-packages/pyperformance/benchmarks/bm_2to3.py --fast --output /tmp/tmpjd495cck`
...........
2to3: Mean +- std dev: 372 ms +- 24 ms
[2/3] django_template...
INFO:root:Running `/venv/cpython3.7-51e257070d4f/bin/python -u /venv/cpython3.7-51e257070d4f/lib/python3.7/site-packages/pyperformance/benchmarks/bm_django_template.py --fast --output /tmp/tmpbq4ujyfw`
...........
django_template: Mean +- std dev: 63.4 ms +- 2.2 ms
[3/3] unpickle_pure_python...
INFO:root:Running `/venv/cpython3.7-51e257070d4f/bin/python -u /venv/cpython3.7-51e257070d4f/lib/python3.7/site-packages/pyperformance/benchmarks/bm_pickle.py --pure-python unpickle --fast --output /tmp/tmp18siurkb`
...........
unpickle_pure_python: Mean +- std dev: 375 us +- 12 us
Performance version: 1.0.1
Report on Linux-5.11.0-13-generic-x86_64-with-debian-10.9
Number of logical CPUs: 8
Start date: 2021-04-11 19:26:36.908459
End date: 2021-04-11 19:27:02.124846
### 2to3 ###
Mean +- std dev: 372 ms +- 24 ms
### django_template ###
Mean +- std dev: 63.4 ms +- 2.2 ms
### unpickle_pure_python ###
Mean +- std dev: 375 us +- 12 us
Further research makes me believe, that docker has a general seccomp performance hit. I modified the default seccomp profile to SCMP_ACT_KILL and it did not kill the service. So I assume that your benchmark never hits a seccomp restriction.
https://github.com/moby/moby/issues/41389 https://github.com/moby/moby/issues/42074
Even when i make an all allow seccomp profile results in a performance hit. So only by using seccomp we have the performance issues. So we have here plain overhead, which is either in linux kernel or in docker.
I ran into a similar issue with seccomp and docker before and in my case the answer turned out to be that starting an application with seccomp activates not only seccomp but also a certain meltdown mitigation which was deactivated by default in my kernel. See here: https://wiki.ubuntu.com/SecurityTeam/KnowledgeBase/SpectreAndMeltdown/MitigationControls and look for spec_store_bypass_disable=[prctl|seccomp]
See https://bugs.python.org/issue38980 for --enable-shared performance.