Multithreading issues when running training inside the docker container
Hello,
I'm facing a huge slowdown when fitting an ALS model in a docker container. The difference in speed is about x10. In both experiments I use the same settings (environment.yml, config file etc.)
Training outside the container: 15 seconds Training inside the container: 350 seconds
Environment variables:
os.environ['MKL_NUM_THREADS'] = '1'
os.environ['OPENBLAS_NUM_THREADS'] = '1'
I set those variables in my train script. I tried different options. Outside the container everything works fine, but inside - something weird and my HTOP window is red.
Here is my Dockerfile:
FROM continuumio/miniconda3:4.9.2
ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PATH="/opt/program:${PATH}"
ARG conda_env=recommender
ENV PATH /opt/conda/envs/$conda_env/bin:$PATH
ENV PYTHONPATH=.
WORKDIR /opt/program
RUN apt-get -y update --allow-releaseinfo-change && \
apt-get install -y --no-install-recommends nginx redis-server && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
COPY config/environment.yml /opt/program
RUN conda env create --file environment.yml
COPY recommender /opt/program/recommender
COPY scripts/train /opt/program
RUN chmod +x /opt/program/train
environment.yml
name: recommender
channels:
- defaults
- conda-forge
dependencies:
- python==3.7
- pip==21.1.3
- boto3==1.9.228
- botocore==1.12.228
- cached-property==1.5.1
- elasticsearch==6.3.1
- elasticsearch-dsl==6.3.1
- flask==1.1.1
- gevent==1.3.4
- greenlet==0.4.13
- gunicorn==19.9.0
- pandas==1.3.5
- sqlalchemy==1.4.32
- mysqlclient==2.0.3
- pymysql==1.0.2
- psycopg2==2.7.5
- scipy==1.7.3
- scikit-learn==1.0.2
- nltk==3.7
- gensim==4.1.2
- matplotlib==3.5.1
- nb_conda==2.2.1
- pip:
- html2text==2020.1.16
- implicit==0.5.2
- efficient-apriori==2.0.1
- hyperopt==0.2.7
- python-dotenv==0.15.0
- redis==4.0.2
Working machine:
- MacBook Air (M1, 2020)
- Memory: 16 GB
- vCPU: 8
> conda info
active environment : recommender
active env location : /Users/a_kulesh/miniconda3/envs/recommender
shell level : 2
user config file : /Users/a_kulesh/.condarc
populated config files :
conda version : 4.13.0
conda-build version : not installed
python version : 3.9.12.final.0
virtual packages : __osx=10.16=0
__unix=0=0
__archspec=1=x86_64
base environment : /Users/a_kulesh/miniconda3 (writable)
conda av data dir : /Users/a_kulesh/miniconda3/etc/conda
conda av metadata url : None
channel URLs : https://repo.anaconda.com/pkgs/main/osx-64
https://repo.anaconda.com/pkgs/main/noarch
https://repo.anaconda.com/pkgs/r/osx-64
https://repo.anaconda.com/pkgs/r/noarch
package cache : /Users/a_kulesh/miniconda3/pkgs
/Users/a_kulesh/.conda/pkgs
envs directories : /Users/a_kulesh/miniconda3/envs
/Users/a_kulesh/.conda/envs
platform : osx-64
user-agent : conda/4.13.0 requests/2.27.1 CPython/3.9.12 Darwin/21.1.0 OSX/10.16
UID:GID : 501:20
netrc file : None
offline mode : False
Has anyone had the same problem or have thoughts on this? I would be very grateful.
UPD:
- it looks like that problem is ralated to the processor type (M1). I tested on Intel processors and there was no difference between running inside/outside the container.
- BUT even with Intel precessors not all vCPUs are used. With an increase in the number of vCPUs, the training time increases What might be the reason?
Can you show your blas config by running this code?
import numpy.__config__
numpy.__config__.show()
Can you also try setting the environment variables outside of python (like export OPENBLAS_NUM_THREADS=1 in the bash shell before calling) ? If you set these environment variables after importing numpy etc it will be too late to have an effect.
let me know if this is still a problem -