docker-stacks icon indicating copy to clipboard operation
docker-stacks copied to clipboard

Speed up execution using jemalloc

Open mathbunnyru opened this issue 4 years ago • 2 comments

I think it would be nice to try to use jupyter under jemalloc. It is quite fast allocator. Because Docker is used, we control we environment and it's quite easy to run everything under jemalloc (just need to use ENV LD_PRELOAD=jemalloc_so_lib_here).

I need to note, that the bionic ubuntu uses outdated jemalloc, but it's fairly easy to build a latest one (inside Docker).

Could you please tell me how you measure the overall performance? I think it's not that difficult to try this out and it may give good performance benefit.

As suggested by @ivmaks in the similar issue:

https://gist.github.com/StephanErb/2deeb80b40e59671380a plz, add this package and add to /etc/ld.so.preload

@parente' s response:

Customizing the memory allocation strategy sounds like fodder for a recipe in the documentation rather than a default for all users. Feel free to submit a PR adding it to the docs.

mathbunnyru avatar Apr 24 '20 12:04 mathbunnyru

Hello,

I must confess that I do not know at all this topic. Maybe some folks around could give their opinion. However it sounds interesting. I've seen that it seems to be used in arrow that implements in memory data that improves the transfer of data between the JVM and Python processes.

As far as I know we do not measure the performance here but we are using a minimal (home made) test framework permitting to run arbitrary commands into the containers, here is an example. https://github.com/jupyter/docker-stacks/blob/29f53f8b992797efffbeb23faaee3a9c7edb2a3b/scipy-notebook/test/test_pandas.py#L19

We may reuse this mechanism and run, for example a performance test with numpy (since it's at the core of scientific computing on this platform). A benchmark seems to exist out of the box.

Regarding the jemalloc deployment we are using conda to install the different package and I have noticed that the version 5.2.1 is available and it deploys the library:

$ conda install -y jemalloc

# The following packages will be downloaded:
# 
#     package                    |            build
#     ---------------------------|-----------------
#     jemalloc-5.2.1             |       he1b5a44_1        11.8 MB  conda-forge
#     openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
#     ------------------------------------------------------------
#                                            Total:        13.9 MB

$ ls -l /opt/conda/lib/*jema*

# -rwxrwxr-x 2 jovyan users 15224152 Apr 21 09:30 /opt/conda/lib/libjemalloc.a
# -rwxrwxr-x 2 jovyan users 15225236 Apr 21 09:30 /opt/conda/lib/libjemalloc_pic.a
# lrwxrwxrwx 1 jovyan users       16 Apr 24 17:18 /opt/conda/lib/libjemalloc.so -> libjemalloc.so.2
# -rwxrwxr-x 2 jovyan users  4402440 Apr 21 09:30 /opt/conda/lib/libjemalloc.so.2

One last thing, we try to avoid image's size to grow too much + 12 MB seems reasonable.

So let's try if you're agree 👍

romainx avatar Apr 24 '20 17:04 romainx

Yes, installing via conda is much easier and, well, pythonic.

I'll give this a try as soon as I have time to do it :)

mathbunnyru avatar Apr 24 '20 20:04 mathbunnyru

People haven't asked for this feature, so I don't think we should implement it. If someone wants to try it out, use this:

RUN mamba install --yes jemalloc
ENV LD_PRELOAD=/opt/conda/lib/libjemalloc.so.2

mathbunnyru avatar Sep 30 '22 10:09 mathbunnyru