stanza
stanza copied to clipboard
Low performance in many-cores systems
Describe the bug When Stanza run from docker container at the server with more then ~20 cores - performance of the pipeline falling dramatically.
To Reproduce
- Get machine with big count of CPU;
- Build and run docker contaier with Stanza;
- Run Stanza pipeline with language lv, processors='tokenize,pos,lemma' (for example);
- Get predictions for the usually-length text (1-4k characters);
- Measure performance.
Expected behavior Performance must grow with parallelizing on cores or be the same at least. Possibility for set the cores/threads limit will be good feature.
Environment (please complete the following information):
- OS: Ubuntu 18.04/20.04 in container; CentOS on the machine
- Python version: Python 3.6/3.8
- Stanza version: 1.0.0; 1.0.1; 1.1.1
Additional context We are build docker containers for our purposes, but the same behavior will be reproduced on the system interpreter level, as i think. Our case was more bad by use the kubernetes, where all cores of the node are visible for python, but CPU usage is limited. When run on the machine in docker as usually, problem was be the same, but less worstly. Core of the problem probably can be in the Torch. We solved this with strong set environment variable "OMP_NUM_THREADS", but if user need to use framework in other places it can be not good practice.
Hi @StarTessar, thanks for reporting this! This may due to the nature of PyTorch parallelization, as you say the problem is solved by explicitly declaring OMP_NUM_THREADS
in the program. I think this is a great solution, do you have any other suggested solution about this?