pyABC icon indicating copy to clipboard operation
pyABC copied to clipboard

RAM usage grows without bound when using `pyabc.sampler.SingleCoreSampler()`

Open Gabriel-p opened this issue 1 year ago • 18 comments

Bug description When I use pyabc.ABCSMC()with sampler=pyabc.sampler.SingleCoreSampler() the RAM usage will some times grow until all available RAM is consumed. This happens rarely but I tested it enough times to reproduce it. The issue goes away if I use instead sampler=pyabc.sampler.MulticoreEvalParallelSampler(n_procs=1)

Script with sampler=pyabc.sampler.SingleCoreSampler()

Captura de pantalla de 2024-01-16 10 37 50

Exact same script but using sampler=pyabc.sampler.MulticoreEvalParallelSampler(n_procs=1)

Captura de pantalla de 2024-01-16 10 38 11

Expected behavior Not use all the RAM.

To reproduce I can't, my scrip is very large and it also does not happen all the time.

Environment

Name: pyabc
Version: 0.12.13
Summary: Distributed, likelihood-free ABC-SMC inference
Home-page: https://github.com/icb-dcm/pyabc
Author: The pyABC developers
Author-email: [email protected]
License: BSD-3-Clause
Location: /home/gabriel/miniconda3/envs/asteca/lib/python3.12/site-packages
Requires: click, cloudpickle, distributed, gitpython, jabbar, matplotlib, numpy, pandas, redis, scikit-learn, scipy, sqlalchemy
Required-by:
/home/gabriel/miniconda3/envs/asteca/bin/python
Python 3.12.0

elementary OS 7.1 (based on Ubuntu 22.04.3 LTS); Linux 6.5.0-14-generic

Gabriel-p avatar Jan 16 '24 13:01 Gabriel-p

Thanks @Gabriel-p for reporting this.

So are you saying you cannot provide the script for us to test to reproduce the results? It would be good to confirm it on another installation.

stephanmg avatar Jan 16 '24 13:01 stephanmg

Let me see if I can clean it up and reduce the number of files to the minimum required

Gabriel-p avatar Jan 16 '24 14:01 Gabriel-p

Ok, here's the compressed file with everything needed to reproduce the issue. You'll need a conda environment with:

python 3.12.0
pyABC 0.12.13
numpy 1.26.2
scipy  1.11.13
astropy 5.3.4
pandas 2.1.1
fastparquet 2023.10.1
fast_histogram 0.12

Then you just run the test_pyABC.py script changing the lines 90 & 91 to switch between samplers.

Let me know if something does not work.

pyABC_test.zip

Gabriel-p avatar Jan 16 '24 14:01 Gabriel-p

Ah, perfect, we will have a look at this.

stephanmg avatar Jan 16 '24 14:01 stephanmg

At @Gabriel-p I can't reproduce your issue here, what is the frequency of this error happening?

stephanmg avatar Jan 17 '24 09:01 stephanmg

Hi @stephanmg, I think I sent the files improperly packaged, not sure if you could manage to run the test_pyABC.py if not let mo know.

I can reproduce the issue 100% of the times, even after restarting the system.Another thing I've noticed is that sometimes the script keeps running in the background even after I close my IDE (Sublime Text)

Gabriel-p avatar Jan 17 '24 11:01 Gabriel-p

Yes, please re-package if possible and I will give it another try. Thanks for your patience.

stephanmg avatar Jan 17 '24 11:01 stephanmg

Now it should work pyABC_test.zip

Gabriel-p avatar Jan 17 '24 11:01 Gabriel-p

Hi @Gabriel-p I can't reproduce it here, I will also assign @arrjon to check the issue.

stephanmg avatar Jan 24 '24 16:01 stephanmg

Ok, I can still reproduce this issue 100% of the times so let me know what I can do to help

Gabriel-p avatar Jan 24 '24 19:01 Gabriel-p

I checked it now on MacOS, and it seems like SingleCoreSampler() is opening more threads than it should. This might explain your issue and seems to be a bug. Using MulticoreEvalParallelSampler(n_procs=1) works as expected.

arrjon avatar Feb 05 '24 08:02 arrjon

Hi @Gabriel-p,

could you show the content of OMP_NUM_THREADS, e.g. echo $OMP_NUM_THREADS.

stephanmg avatar Feb 05 '24 09:02 stephanmg

... and could you try the branch fix_singlecore, and let me know if it works?

stephanmg avatar Feb 05 '24 09:02 stephanmg

echo $OMP_NUM_THREADS returns nothing.

This is the output to screen with the fix_singlecore branch and sampler=pyabc.sampler.MulticoreEvalParallelSampler(n_procs=1):

ABC.Sampler INFO: Parallelize sampling on 1 processes.
ABC.Sampler INFO: Parallelize sampling on 1 processes.
ABC.History INFO: Start <ABCSMC id=5, start_time=2024-02-06 08:38:41>
ABC.History INFO: Start <ABCSMC id=5, start_time=2024-02-06 08:38:41>
ABC INFO: Calibration sample t = -1.
ABC INFO: Calibration sample t = -1.
ABC INFO: t: 0, eps: 1.32229323e-01.
ABC INFO: t: 0, eps: 1.32229323e-01.
ABC INFO: Accepted: 500 / 1031 = 4.8497e-01, ESS: 5.0000e+02.
ABC INFO: Accepted: 500 / 1031 = 4.8497e-01, ESS: 5.0000e+02.
ABC INFO: t: 1, eps: 1.00988341e-01.
ABC INFO: t: 1, eps: 1.00988341e-01.
ABC INFO: Accepted: 500 / 972 = 5.1440e-01, ESS: 4.2383e+02.
ABC INFO: Accepted: 500 / 972 = 5.1440e-01, ESS: 4.2383e+02.
ABC INFO: t: 2, eps: 8.23765786e-02.
ABC INFO: t: 2, eps: 8.23765786e-02.
ABC INFO: Accepted: 500 / 1098 = 4.5537e-01, ESS: 4.1058e+02.
ABC INFO: Accepted: 500 / 1098 = 4.5537e-01, ESS: 4.1058e+02.
ABC INFO: t: 3, eps: 7.20554730e-02.
ABC INFO: t: 3, eps: 7.20554730e-02.
ABC INFO: Accepted: 500 / 1096 = 4.5620e-01, ESS: 4.2701e+02.
ABC INFO: Accepted: 500 / 1096 = 4.5620e-01, ESS: 4.2701e+02.
ABC INFO: t: 4, eps: 6.45272070e-02.
ABC INFO: t: 4, eps: 6.45272070e-02.
ABC INFO: Accepted: 500 / 1144 = 4.3706e-01, ESS: 4.2139e+02.
ABC INFO: Accepted: 500 / 1144 = 4.3706e-01, ESS: 4.2139e+02.
ABC INFO: Stop: Maximum walltime.
ABC INFO: Stop: Maximum walltime.
ABC.History INFO: Done <ABCSMC id=5, duration=0:02:05.371858, end_time=2024-02-06 08:40:47>
ABC.History INFO: Done <ABCSMC id=5, duration=0:02:05.371858, end_time=2024-02-06 08:40:47>

It appears to be running the sampler twice? The RAM usage stays low as expected.

This is the output to screen with the fix_singlecore branch and sampler=pyabc.sampler.SingleCoreSampler():

ABC.History INFO: Start <ABCSMC id=6, start_time=2024-02-06 08:41:40>
ABC.History INFO: Start <ABCSMC id=6, start_time=2024-02-06 08:41:40>
ABC INFO: Calibration sample t = -1.
ABC INFO: Calibration sample t = -1.
Active threads: <function active_count at 0x7f4dbc321120>
[<_MainThread(MainThread, started 139971898045504)>]
Active threads: <function active_count at 0x7f4dbc321120>
[<_MainThread(MainThread, started 139971898045504)>]
Active threads: <function active_count at 0x7f4dbc321120>
[<_MainThread(MainThread, started 139971898045504)>]
Active threads: <function active_count at 0x7f4dbc321120>
[<_MainThread(MainThread, started 139971898045504)>]
Active threads: <function active_count at 0x7f4dbc321120>
[<_MainThread(MainThread, started 139971898045504)>]
Active threads: <function active_count at 0x7f4dbc321120>
[<_MainThread(MainThread, started 139971898045504)>]
Active threads: <function active_count at 0x7f4dbc321120>
[<_MainThread(MainThread, started 139971898045504)>]
Active threads: <function active_count at 0x7f4dbc321120>
....

The RAM usage immediately starts climbing.

Gabriel-p avatar Feb 06 '24 11:02 Gabriel-p

Thanks for the information @Gabriel-p - we are currently still troubleshooting the issue. We will push the fix, when it's ready, to the fix_singlecore branch for you.

stephanmg avatar Feb 07 '24 15:02 stephanmg

@Gabriel-p might be related to this issue: https://github.com/ICB-DCM/pyPESTO/issues/1312

Could you please try again the fix_singlecore branch?

stephanmg avatar Feb 29 '24 16:02 stephanmg

@stephanmg just tested the fix_singlecore branch, the issue is still there

Gabriel-p avatar Feb 29 '24 22:02 Gabriel-p

Thanks for testing so quickly, hoped the issue would go away in light of this. However, seems that we need to dig deeper.

stephanmg avatar Mar 01 '24 04:03 stephanmg

@arrjon could you review my changes? i think this issue should be resolved ASAP.

stephanmg avatar Jun 26 '25 10:06 stephanmg

@Gabriel-p could you confirm that the issue persists in the latest pyABC release 0.12.16?

stephanmg avatar Jun 27 '25 11:06 stephanmg

The issue appears to be solved in this new version

Gabriel-p avatar Jun 27 '25 11:06 Gabriel-p

Thanks for the quick reply.

If the issue surfaces again, we will have to scrutinize pyABC again. In this case, feel free to re-open this issue #626 again.

stephanmg avatar Jun 27 '25 11:06 stephanmg