implicit Multithreading issues when running training inside the docker container

Hello,

I'm facing a huge slowdown when fitting an ALS model in a docker container. The difference in speed is about x10. In both experiments I use the same settings (environment.yml, config file etc.)

Training outside the container: 15 seconds Training inside the container: 350 seconds

Environment variables:

os.environ['MKL_NUM_THREADS'] = '1'
os.environ['OPENBLAS_NUM_THREADS'] = '1'

I set those variables in my train script. I tried different options. Outside the container everything works fine, but inside - something weird and my HTOP window is red.

Here is my Dockerfile:

FROM continuumio/miniconda3:4.9.2

ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PATH="/opt/program:${PATH}"
ARG conda_env=recommender
ENV PATH /opt/conda/envs/$conda_env/bin:$PATH
ENV PYTHONPATH=.
WORKDIR /opt/program

RUN apt-get -y update --allow-releaseinfo-change && \
    apt-get install -y --no-install-recommends nginx redis-server && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

COPY config/environment.yml /opt/program
RUN conda env create --file environment.yml

COPY recommender /opt/program/recommender
COPY scripts/train /opt/program

RUN chmod +x /opt/program/train

environment.yml

name: recommender
channels:
  - defaults
  - conda-forge
dependencies:
  - python==3.7
  - pip==21.1.3
  - boto3==1.9.228
  - botocore==1.12.228
  - cached-property==1.5.1
  - elasticsearch==6.3.1
  - elasticsearch-dsl==6.3.1
  - flask==1.1.1
  - gevent==1.3.4
  - greenlet==0.4.13
  - gunicorn==19.9.0
  - pandas==1.3.5
  - sqlalchemy==1.4.32
  - mysqlclient==2.0.3
  - pymysql==1.0.2
  - psycopg2==2.7.5
  - scipy==1.7.3
  - scikit-learn==1.0.2
  - nltk==3.7
  - gensim==4.1.2
  - matplotlib==3.5.1
  - nb_conda==2.2.1
  - pip:
    - html2text==2020.1.16
    - implicit==0.5.2
    - efficient-apriori==2.0.1
    - hyperopt==0.2.7
    - python-dotenv==0.15.0
    - redis==4.0.2

Working machine:

MacBook Air (M1, 2020)
Memory: 16 GB
vCPU: 8

> conda info

     active environment : recommender
    active env location : /Users/a_kulesh/miniconda3/envs/recommender
            shell level : 2
       user config file : /Users/a_kulesh/.condarc
 populated config files : 
          conda version : 4.13.0
    conda-build version : not installed
         python version : 3.9.12.final.0
       virtual packages : __osx=10.16=0
                          __unix=0=0
                          __archspec=1=x86_64
       base environment : /Users/a_kulesh/miniconda3  (writable)
      conda av data dir : /Users/a_kulesh/miniconda3/etc/conda
  conda av metadata url : None
           channel URLs : https://repo.anaconda.com/pkgs/main/osx-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/osx-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /Users/a_kulesh/miniconda3/pkgs
                          /Users/a_kulesh/.conda/pkgs
       envs directories : /Users/a_kulesh/miniconda3/envs
                          /Users/a_kulesh/.conda/envs
               platform : osx-64
             user-agent : conda/4.13.0 requests/2.27.1 CPython/3.9.12 Darwin/21.1.0 OSX/10.16
                UID:GID : 501:20
             netrc file : None
           offline mode : False

Has anyone had the same problem or have thoughts on this? I would be very grateful.

UPD:

it looks like that problem is ralated to the processor type (M1). I tested on Intel processors and there was no difference between running inside/outside the container.
BUT even with Intel precessors not all vCPUs are used. With an increase in the number of vCPUs, the training time increases What might be the reason?

Jul 14 '22 19:07 akulesh

Can you show your blas config by running this code?

import numpy.__config__
numpy.__config__.show()

Can you also try setting the environment variables outside of python (like export OPENBLAS_NUM_THREADS=1 in the bash shell before calling) ? If you set these environment variables after importing numpy etc it will be too late to have an effect.

Sep 10 '22 00:09 benfred

let me know if this is still a problem -

Sep 29 '23 00:09 benfred