pylhe
pylhe copied to clipboard
Slowdown and memory increase with time
I'm using pylhe for looping on several LHE files, each containing 100K events. Running the snippet below on a lxplus machine (CentOS Linux release 7.9.2009), one can see that iterations become slower as time progresses, and eventually the job gets killed due to too much memory being used.
import pylhe
import time
afile = "/afs/cern.ch/work/b/bfontana/public/Singlet_TManualV3_all_M280p00_ST0p14_L463p05_K1p00_cmsgrid_final.lhe"
atime = time.time()
for ievt, evt in enumerate(pylhe.read_lhe(afile)): #pylhe.read_lhe_with_attributes
if ievt%5000==0:
print(time.time() - atime)
atime = time.time()
print(' - {} events processed'.format(ievt))
The significant slowdown occurs at iteration ~40K/50K. I would expect no memory increase given that we are dealing with a generator. Is the above behavior expected? I'm using Python 3.5.6 (GCC 6.2.0).
As a cross-check, I also tried to get rid of enumerate (which is lazy), but the slowdown seems identical:
import pylhe
import itertools as it
import time
afile = "/afs/cern.ch/work/b/bfontana/public/Singlet_TManualV3_all_M280p00_ST0p14_L463p05_K1p00_cmsgrid_final.lhe"
events = pylhe.read_lhe_with_attributes(afile)
# Get event 1
atime = time.time()
for ievt in range(100000): #pylhe.read_lhe_with_attributes
if ievt%5000==0:
print(time.time() - atime)
atime = time.time()
print(' - {} events processed'.format(ievt))
event = next(it.islice(events, 1, 2))
I'm using Python 3.5.6 (GCC 6.2.0).
In that case you're using a Python version that hasn't been supported since PR https://github.com/scikit-hep/pylhe/pull/47 and so before pylhe v0.1.0.
Please replicate your issue with a modern supported version (pylhe supports Python 3.8 or newer) and specify your environment (i.e. pylhe version and provide a way to replicate the minimal required environment to produce the behavior).
$ eol python
┌───────┬────────────┬─────────┬────────────────┬────────────┬────────────┐
│ cycle │ release │ latest │ latest release │ support │ eol │
├───────┼────────────┼─────────┼────────────────┼────────────┼────────────┤
│ 3.11 │ 2022-10-24 │ 3.11.5 │ 2023-08-24 │ 2024-04-01 │ 2027-10-24 │
│ 3.10 │ 2021-10-04 │ 3.10.13 │ 2023-08-24 │ 2023-04-05 │ 2026-10-04 │
│ 3.9 │ 2020-10-05 │ 3.9.18 │ 2023-08-24 │ 2022-05-17 │ 2025-10-05 │
│ 3.8 │ 2019-10-14 │ 3.8.18 │ 2023-08-24 │ 2021-05-03 │ 2024-10-14 │
│ 3.7 │ 2018-06-26 │ 3.7.17 │ 2023-06-05 │ 2020-06-27 │ 2023-06-27 │
│ 3.6 │ 2016-12-22 │ 3.6.15 │ 2021-09-03 │ 2018-12-24 │ 2021-12-23 │
│ 3.5 │ 2015-09-12 │ 3.5.10 │ 2020-09-05 │ False │ 2020-09-13 │
│ 3.4 │ 2014-03-15 │ 3.4.10 │ 2019-03-18 │ False │ 2019-03-18 │
│ 3.3 │ 2012-09-29 │ 3.3.7 │ 2017-09-19 │ False │ 2017-09-29 │
│ 2.7 │ 2010-07-03 │ 2.7.18 │ 2020-04-19 │ False │ 2020-01-01 │
│ 2.6 │ 2008-10-01 │ 2.6.9 │ 2013-10-29 │ False │ 2013-10-29 │
└───────┴────────────┴─────────┴────────────────┴────────────┴────────────┘
If you have access to a lxplus machine, you can run the following, where test.py is the name of one of the scripts above (you should have access to the input file):
# python 3.9.12 and pylhe 0.7.0
source /cvmfs/sft.cern.ch/lcg/views/LCG_103/x86_64-centos7-gcc11-opt/setup.sh
python test.py
As an alternative, I've also run the script in a mamba environment:
# python 3.11.5 and pylhe 0.7.0
mamba create -n TestPyLHE python=3 pylhe
mamba activate TestPyLHE
# the wished python version is not picked by default, but the version below includes pylhe 0.7.0
python3.11 test.py
Both methods lead to the behavior reported in the first post.
@matthewfeickert Is there any update?
Hello @bfonta, from my side I confess I'm very loaded at the moment to go and profile and investigate in detail. If you fancy contributing maybe you could try and go deeper with say pyinstrument?
I pinged due to an approaching deadline; I can try to have a look at it, but I am also currently quite loaded. Thank you for the suggestion, though.
I appreciate and understand the issue. From our side I can also say that a community endeavour can only be such if there is at least some little community engagement, and even the simplest contributions are super welcome (this one at hand is not a 10-minute piece of work, unfortunately).
Thanks a lot.