pyABC
pyABC copied to clipboard
RAM usage grows without bound when using `pyabc.sampler.SingleCoreSampler()`
Bug description
When I use pyabc.ABCSMC()with sampler=pyabc.sampler.SingleCoreSampler() the RAM usage will some times grow until all available RAM is consumed. This happens rarely but I tested it enough times to reproduce it. The issue goes away if I use instead sampler=pyabc.sampler.MulticoreEvalParallelSampler(n_procs=1)
Script with sampler=pyabc.sampler.SingleCoreSampler()
Exact same script but using sampler=pyabc.sampler.MulticoreEvalParallelSampler(n_procs=1)
Expected behavior Not use all the RAM.
To reproduce I can't, my scrip is very large and it also does not happen all the time.
Environment
Name: pyabc
Version: 0.12.13
Summary: Distributed, likelihood-free ABC-SMC inference
Home-page: https://github.com/icb-dcm/pyabc
Author: The pyABC developers
Author-email: [email protected]
License: BSD-3-Clause
Location: /home/gabriel/miniconda3/envs/asteca/lib/python3.12/site-packages
Requires: click, cloudpickle, distributed, gitpython, jabbar, matplotlib, numpy, pandas, redis, scikit-learn, scipy, sqlalchemy
Required-by:
/home/gabriel/miniconda3/envs/asteca/bin/python
Python 3.12.0
elementary OS 7.1 (based on Ubuntu 22.04.3 LTS); Linux 6.5.0-14-generic
Thanks @Gabriel-p for reporting this.
So are you saying you cannot provide the script for us to test to reproduce the results? It would be good to confirm it on another installation.
Let me see if I can clean it up and reduce the number of files to the minimum required
Ok, here's the compressed file with everything needed to reproduce the issue. You'll need a conda environment with:
python 3.12.0
pyABC 0.12.13
numpy 1.26.2
scipy 1.11.13
astropy 5.3.4
pandas 2.1.1
fastparquet 2023.10.1
fast_histogram 0.12
Then you just run the test_pyABC.py script changing the lines 90 & 91 to switch between samplers.
Let me know if something does not work.
Ah, perfect, we will have a look at this.
At @Gabriel-p I can't reproduce your issue here, what is the frequency of this error happening?
Hi @stephanmg, I think I sent the files improperly packaged, not sure if you could manage to run the test_pyABC.py if not let mo know.
I can reproduce the issue 100% of the times, even after restarting the system.Another thing I've noticed is that sometimes the script keeps running in the background even after I close my IDE (Sublime Text)
Yes, please re-package if possible and I will give it another try. Thanks for your patience.
Now it should work pyABC_test.zip
Hi @Gabriel-p I can't reproduce it here, I will also assign @arrjon to check the issue.
Ok, I can still reproduce this issue 100% of the times so let me know what I can do to help
I checked it now on MacOS, and it seems like SingleCoreSampler() is opening more threads than it should. This might explain your issue and seems to be a bug. Using MulticoreEvalParallelSampler(n_procs=1) works as expected.
Hi @Gabriel-p,
could you show the content of OMP_NUM_THREADS, e.g. echo $OMP_NUM_THREADS.
... and could you try the branch fix_singlecore, and let me know if it works?
echo $OMP_NUM_THREADS returns nothing.
This is the output to screen with the fix_singlecore branch and sampler=pyabc.sampler.MulticoreEvalParallelSampler(n_procs=1):
ABC.Sampler INFO: Parallelize sampling on 1 processes.
ABC.Sampler INFO: Parallelize sampling on 1 processes.
ABC.History INFO: Start <ABCSMC id=5, start_time=2024-02-06 08:38:41>
ABC.History INFO: Start <ABCSMC id=5, start_time=2024-02-06 08:38:41>
ABC INFO: Calibration sample t = -1.
ABC INFO: Calibration sample t = -1.
ABC INFO: t: 0, eps: 1.32229323e-01.
ABC INFO: t: 0, eps: 1.32229323e-01.
ABC INFO: Accepted: 500 / 1031 = 4.8497e-01, ESS: 5.0000e+02.
ABC INFO: Accepted: 500 / 1031 = 4.8497e-01, ESS: 5.0000e+02.
ABC INFO: t: 1, eps: 1.00988341e-01.
ABC INFO: t: 1, eps: 1.00988341e-01.
ABC INFO: Accepted: 500 / 972 = 5.1440e-01, ESS: 4.2383e+02.
ABC INFO: Accepted: 500 / 972 = 5.1440e-01, ESS: 4.2383e+02.
ABC INFO: t: 2, eps: 8.23765786e-02.
ABC INFO: t: 2, eps: 8.23765786e-02.
ABC INFO: Accepted: 500 / 1098 = 4.5537e-01, ESS: 4.1058e+02.
ABC INFO: Accepted: 500 / 1098 = 4.5537e-01, ESS: 4.1058e+02.
ABC INFO: t: 3, eps: 7.20554730e-02.
ABC INFO: t: 3, eps: 7.20554730e-02.
ABC INFO: Accepted: 500 / 1096 = 4.5620e-01, ESS: 4.2701e+02.
ABC INFO: Accepted: 500 / 1096 = 4.5620e-01, ESS: 4.2701e+02.
ABC INFO: t: 4, eps: 6.45272070e-02.
ABC INFO: t: 4, eps: 6.45272070e-02.
ABC INFO: Accepted: 500 / 1144 = 4.3706e-01, ESS: 4.2139e+02.
ABC INFO: Accepted: 500 / 1144 = 4.3706e-01, ESS: 4.2139e+02.
ABC INFO: Stop: Maximum walltime.
ABC INFO: Stop: Maximum walltime.
ABC.History INFO: Done <ABCSMC id=5, duration=0:02:05.371858, end_time=2024-02-06 08:40:47>
ABC.History INFO: Done <ABCSMC id=5, duration=0:02:05.371858, end_time=2024-02-06 08:40:47>
It appears to be running the sampler twice? The RAM usage stays low as expected.
This is the output to screen with the fix_singlecore branch and sampler=pyabc.sampler.SingleCoreSampler():
ABC.History INFO: Start <ABCSMC id=6, start_time=2024-02-06 08:41:40>
ABC.History INFO: Start <ABCSMC id=6, start_time=2024-02-06 08:41:40>
ABC INFO: Calibration sample t = -1.
ABC INFO: Calibration sample t = -1.
Active threads: <function active_count at 0x7f4dbc321120>
[<_MainThread(MainThread, started 139971898045504)>]
Active threads: <function active_count at 0x7f4dbc321120>
[<_MainThread(MainThread, started 139971898045504)>]
Active threads: <function active_count at 0x7f4dbc321120>
[<_MainThread(MainThread, started 139971898045504)>]
Active threads: <function active_count at 0x7f4dbc321120>
[<_MainThread(MainThread, started 139971898045504)>]
Active threads: <function active_count at 0x7f4dbc321120>
[<_MainThread(MainThread, started 139971898045504)>]
Active threads: <function active_count at 0x7f4dbc321120>
[<_MainThread(MainThread, started 139971898045504)>]
Active threads: <function active_count at 0x7f4dbc321120>
[<_MainThread(MainThread, started 139971898045504)>]
Active threads: <function active_count at 0x7f4dbc321120>
....
The RAM usage immediately starts climbing.
Thanks for the information @Gabriel-p - we are currently still troubleshooting the issue. We will push the fix, when it's ready, to the fix_singlecore branch for you.
@Gabriel-p might be related to this issue: https://github.com/ICB-DCM/pyPESTO/issues/1312
Could you please try again the fix_singlecore branch?
@stephanmg just tested the fix_singlecore branch, the issue is still there
Thanks for testing so quickly, hoped the issue would go away in light of this. However, seems that we need to dig deeper.
@arrjon could you review my changes? i think this issue should be resolved ASAP.
@Gabriel-p could you confirm that the issue persists in the latest pyABC release 0.12.16?
The issue appears to be solved in this new version
Thanks for the quick reply.
If the issue surfaces again, we will have to scrutinize pyABC again. In this case, feel free to re-open this issue #626 again.