EconML
EconML copied to clipboard
Performance deterioration with greater number of cores
I am fitting a CausalForestDML model on an Azure Machine Learning compute instance with 32 cores. My dataset has ~3 million records and ~60 variables. Surprisingly, fitting speed decreases with a greater number of cores. For instance, a model with n_jobs=-1 (i.e. all 32) cores runs slower than a model with n_jobs=12, which in turn runs slower than a model with n_jobs=6 (see attached images).
data:image/s3,"s3://crabby-images/8c6f4/8c6f45215df6b2e987b6de22b948133dc0c02bfb" alt="6core"
data:image/s3,"s3://crabby-images/f8724/f8724165674abcb320ef1bc927828238acb90e5b" alt="12core"
data:image/s3,"s3://crabby-images/5e589/5e589cafc9d61406f3c1253ab0dc4a7ab9d8e5a3" alt="32core"
I am using the standard threading backend. Given the low overhead nature of threading, I am a little puzzled why parallelization would decrease fitting speed. I would be glad to get any insight regarding this behavior. Has anyone else experienced this issue? Could this have something to do with the compute instance itself?
Compute instance:
- Standard_F32s_v2 (32 cores, 64 GB RAM, 265 GB disk) details here
Environment:
- Linux, Ubuntu 18.04
- Python 3.8.10
- EconML 0.12.0