docker-stacks
docker-stacks copied to clipboard
Speed up execution using jemalloc
I think it would be nice to try to use jupyter under jemalloc.
It is quite fast allocator.
Because Docker is used, we control we environment and it's quite easy to run everything under jemalloc (just need to use ENV LD_PRELOAD=jemalloc_so_lib_here
).
I need to note, that the bionic ubuntu uses outdated jemalloc, but it's fairly easy to build a latest one (inside Docker).
Could you please tell me how you measure the overall performance? I think it's not that difficult to try this out and it may give good performance benefit.
As suggested by @ivmaks in the similar issue:
https://gist.github.com/StephanErb/2deeb80b40e59671380a plz, add this package and add to /etc/ld.so.preload
@parente' s response:
Customizing the memory allocation strategy sounds like fodder for a recipe in the documentation rather than a default for all users. Feel free to submit a PR adding it to the docs.
Hello,
I must confess that I do not know at all this topic. Maybe some folks around could give their opinion. However it sounds interesting. I've seen that it seems to be used in arrow
that implements in memory data that improves the transfer of data between the JVM and Python processes.
As far as I know we do not measure the performance here but we are using a minimal (home made) test framework permitting to run arbitrary commands into the containers, here is an example. https://github.com/jupyter/docker-stacks/blob/29f53f8b992797efffbeb23faaee3a9c7edb2a3b/scipy-notebook/test/test_pandas.py#L19
We may reuse this mechanism and run, for example a performance test with numpy
(since it's at the core of scientific computing on this platform). A benchmark seems to exist out of the box.
Regarding the jemalloc
deployment we are using conda
to install the different package and I have noticed that the version 5.2.1
is available and it deploys the library:
$ conda install -y jemalloc
# The following packages will be downloaded:
#
# package | build
# ---------------------------|-----------------
# jemalloc-5.2.1 | he1b5a44_1 11.8 MB conda-forge
# openssl-1.1.1g | h516909a_0 2.1 MB conda-forge
# ------------------------------------------------------------
# Total: 13.9 MB
$ ls -l /opt/conda/lib/*jema*
# -rwxrwxr-x 2 jovyan users 15224152 Apr 21 09:30 /opt/conda/lib/libjemalloc.a
# -rwxrwxr-x 2 jovyan users 15225236 Apr 21 09:30 /opt/conda/lib/libjemalloc_pic.a
# lrwxrwxrwx 1 jovyan users 16 Apr 24 17:18 /opt/conda/lib/libjemalloc.so -> libjemalloc.so.2
# -rwxrwxr-x 2 jovyan users 4402440 Apr 21 09:30 /opt/conda/lib/libjemalloc.so.2
One last thing, we try to avoid image's size to grow too much + 12 MB seems reasonable.
So let's try if you're agree 👍
Yes, installing via conda is much easier and, well, pythonic.
I'll give this a try as soon as I have time to do it :)
People haven't asked for this feature, so I don't think we should implement it. If someone wants to try it out, use this:
RUN mamba install --yes jemalloc
ENV LD_PRELOAD=/opt/conda/lib/libjemalloc.so.2