pylhe icon indicating copy to clipboard operation
pylhe copied to clipboard

Slowdown and memory increase with time

Open bfonta opened this issue 2 years ago • 7 comments

I'm using pylhe for looping on several LHE files, each containing 100K events. Running the snippet below on a lxplus machine (CentOS Linux release 7.9.2009), one can see that iterations become slower as time progresses, and eventually the job gets killed due to too much memory being used.

import pylhe                                                                                                                                
import time                                                                                                                                 
                                                                                                                                            
afile = "/afs/cern.ch/work/b/bfontana/public/Singlet_TManualV3_all_M280p00_ST0p14_L463p05_K1p00_cmsgrid_final.lhe"                          
atime = time.time()                                                                                                                         
for ievt, evt in enumerate(pylhe.read_lhe(afile)): #pylhe.read_lhe_with_attributes                                                          
    if ievt%5000==0:                                                                                                                        
        print(time.time() - atime)                                                                                                          
        atime = time.time()                                                                                                                 
        print(' - {} events processed'.format(ievt)) 

The significant slowdown occurs at iteration ~40K/50K. I would expect no memory increase given that we are dealing with a generator. Is the above behavior expected? I'm using Python 3.5.6 (GCC 6.2.0).

bfonta avatar Aug 28 '23 09:08 bfonta

As a cross-check, I also tried to get rid of enumerate (which is lazy), but the slowdown seems identical:

import pylhe                                                                                                                                
import itertools as it                                                                                                                      
import time                                                                                                                                 
                                                                                                                                            
afile = "/afs/cern.ch/work/b/bfontana/public/Singlet_TManualV3_all_M280p00_ST0p14_L463p05_K1p00_cmsgrid_final.lhe"                          
events = pylhe.read_lhe_with_attributes(afile)                                                                                              
                                                                                                                                            
# Get event 1                                                                                                                               
atime = time.time()                                                                                                                         
for ievt in range(100000): #pylhe.read_lhe_with_attributes                                                                                  
    if ievt%5000==0:                                                                                                                        
        print(time.time() - atime)                                                                                                          
        atime = time.time()                                                                                                                 
        print(' - {} events processed'.format(ievt))                                                                                        
                                                                                                                                            
    event = next(it.islice(events, 1, 2))  

bfonta avatar Aug 28 '23 10:08 bfonta

I'm using Python 3.5.6 (GCC 6.2.0).

In that case you're using a Python version that hasn't been supported since PR https://github.com/scikit-hep/pylhe/pull/47 and so before pylhe v0.1.0.

Please replicate your issue with a modern supported version (pylhe supports Python 3.8 or newer) and specify your environment (i.e. pylhe version and provide a way to replicate the minimal required environment to produce the behavior).

$ eol python
┌───────┬────────────┬─────────┬────────────────┬────────────┬────────────┐
│ cycle │  release   │ latest  │ latest release │  support   │    eol     │
├───────┼────────────┼─────────┼────────────────┼────────────┼────────────┤
│ 3.11  │ 2022-10-24 │ 3.11.5  │   2023-08-24   │ 2024-04-01 │ 2027-10-24 │
│ 3.10  │ 2021-10-04 │ 3.10.13 │   2023-08-24   │ 2023-04-05 │ 2026-10-04 │
│ 3.9   │ 2020-10-05 │ 3.9.18  │   2023-08-24   │ 2022-05-17 │ 2025-10-05 │
│ 3.8   │ 2019-10-14 │ 3.8.18  │   2023-08-24   │ 2021-05-03 │ 2024-10-14 │
│ 3.7   │ 2018-06-26 │ 3.7.17  │   2023-06-05   │ 2020-06-27 │ 2023-06-27 │
│ 3.6   │ 2016-12-22 │ 3.6.15  │   2021-09-03   │ 2018-12-24 │ 2021-12-23 │
│ 3.5   │ 2015-09-12 │ 3.5.10  │   2020-09-05   │   False    │ 2020-09-13 │
│ 3.4   │ 2014-03-15 │ 3.4.10  │   2019-03-18   │   False    │ 2019-03-18 │
│ 3.3   │ 2012-09-29 │ 3.3.7   │   2017-09-19   │   False    │ 2017-09-29 │
│ 2.7   │ 2010-07-03 │ 2.7.18  │   2020-04-19   │   False    │ 2020-01-01 │
│ 2.6   │ 2008-10-01 │ 2.6.9   │   2013-10-29   │   False    │ 2013-10-29 │
└───────┴────────────┴─────────┴────────────────┴────────────┴────────────┘

matthewfeickert avatar Aug 28 '23 21:08 matthewfeickert

If you have access to a lxplus machine, you can run the following, where test.py is the name of one of the scripts above (you should have access to the input file):

# python 3.9.12 and pylhe 0.7.0
source /cvmfs/sft.cern.ch/lcg/views/LCG_103/x86_64-centos7-gcc11-opt/setup.sh
python test.py

As an alternative, I've also run the script in a mamba environment:

# python 3.11.5 and pylhe 0.7.0
mamba create -n TestPyLHE python=3 pylhe
mamba activate TestPyLHE

# the wished python version is not picked by default, but the version below includes pylhe 0.7.0
python3.11 test.py

Both methods lead to the behavior reported in the first post.

bfonta avatar Aug 29 '23 08:08 bfonta

@matthewfeickert Is there any update?

bfonta avatar Sep 07 '23 07:09 bfonta

Hello @bfonta, from my side I confess I'm very loaded at the moment to go and profile and investigate in detail. If you fancy contributing maybe you could try and go deeper with say pyinstrument?

eduardo-rodrigues avatar Sep 07 '23 09:09 eduardo-rodrigues

I pinged due to an approaching deadline; I can try to have a look at it, but I am also currently quite loaded. Thank you for the suggestion, though.

bfonta avatar Sep 07 '23 09:09 bfonta

I appreciate and understand the issue. From our side I can also say that a community endeavour can only be such if there is at least some little community engagement, and even the simplest contributions are super welcome (this one at hand is not a 10-minute piece of work, unfortunately).

Thanks a lot.

eduardo-rodrigues avatar Sep 07 '23 09:09 eduardo-rodrigues