pygama
pygama copied to clipboard
DataLoader performance is bad?
At the analysis workshop, it was argued that the slowness of the DataLoader was only due to the IO speed. When I loaded the data with "low-level routines", I never felt it was that slow. To test this quantitatively, I wrote a quick script to compare low-level loading with DataLoader loading.
The script is:
import lgdo.lh5_store as store
from legendmeta import LegendMetadata
import re
import glob
import numpy as np
import time
from pygama.flow import DataLoader
def low_level_load(period="p03",run="r000",ndets=10):
start = time.time()
meta_path = "/data2/public/prodenv/prod-blind/ref/v01.06/inputs"
f_hit = sorted(glob.glob("/data2/public/prodenv/prod-blind/ref/v01.06/generated/tier/hit/phy/"+period+"/"+run+"/*.lh5"))
f_tcm = [e.replace("hit", "tcm") for e in f_hit]
dt_files = time.time() - start
print(f"time to located files: \t {dt_files:.3f} s")
lmeta = LegendMetadata(path=meta_path)
chmap = lmeta.channelmap(re.search(r"\d{8}T\d{6}Z", f_hit[0]).group(0))
geds = list(chmap.map("system", unique=False).geds.map("daq.rawid").keys())[:ndets]
dt_meta = time.time() - start - dt_files
print(f"time to load meta: \t {dt_meta:.3f} s")
# load TCM data to define an event
nda = store.load_nda(f_tcm, ["array_id", "array_idx","cumulative_length"], "hardware_tcm_1/")
clt = nda["cumulative_length"]
split = clt[np.diff(clt,append=[clt[-1]])<0]
ids = np.split(nda["array_id"],np.cumsum(split))
idx = np.split(nda["array_idx"],np.cumsum(split))
dt_tcm = time.time() -start - dt_meta
print(f"time to load tcm: \t {dt_tcm:.3f} s")
for ch in geds:
idx_ch = [idx[i][ids[i] == ch] for i in range(len(idx))]
nda = store.load_nda(f_hit,['cuspEmax_ctc_cal','AoE_Classifier'],f"ch{ch}/hit/",idx_list=idx_ch)
mask = nda['cuspEmax_ctc_cal'] > 25
nda['AoE_Classifier'][mask]
dt_end = time.time() - start -dt_tcm
print(f"time to load data: \t {dt_end:.3f} s")
def data_loader_load(period="p03",run="r000",ndets=10):
start = time.time()
prodenv = "/data2/public/prodenv"
dl = DataLoader(f"{prodenv}/prod-blind/ref/v01.06[setups/l200/dataloader]")
file_query = f"period == '{period}' and run == '{run}' and datatype == 'phy'"
dl.reset()
dl.data_dir = prodenv
dl.set_files(file_query)
dt_files = time.time() - start
print(f"time to located files: \t {dt_files:.3f} s")
first_key = dl.get_file_list().iloc[0].timestamp
lmeta = LegendMetadata(f"{prodenv}/prod-blind/ref/v01.06/inputs")
chmap = lmeta.channelmap(on=first_key) # get the channel map
geds = list(chmap.map("system", unique=False).geds.map("daq.rawid").keys())[:ndets]
dl.set_datastreams(geds, "ch")
dt_meta = time.time() - start - dt_files
print(f"time to load meta: \t {dt_meta:.3f} s")
dl.set_cuts({"hit": "trapEmax_ctc_cal > 25"})
dl.set_output(columns=["AoE_Classifier"], fmt="lgdo.Table")
geds_el = dl.build_entry_list(tcm_level="tcm")
dt_tcm = time.time() -start - dt_meta
print(f"time to load tcm: \t {dt_tcm:.3f} s")
dl.load(geds_el)
dt_end = time.time() - start - dt_tcm
print(f"time to load data: \t {dt_end:.3f} s")
if __name__ == "__main__":
print("Try low level:")
low_level_load()
print("Try Data loader:")
data_loader_load()
First, I would ask you if you can find any unfair treatments in one or the other routine. Booth routines should:
- Use data from period 03 run 000 for 10 detectors
- Set a cut on the energy > 25 keV
- Load the AoE values for the 10 geds where the cut is valid according to the TCM
The result on LNGS of the script is:
> python speed_test.py
Try low level:
time to located files: 0.003 s
time to load meta: 0.931 s
time to load tcm: 1.605 s
time to load data: 10.845 s
Try Data loader:
time to located files: 0.918 s
time to load meta: 0.763 s
time to load tcm: 172.463 s
time to load data: 7.972 s
what is going on with the tcm loading!
okay thanks for testing, will check this asap.
It looks like this performance issue could be related to these: https://forum.hdfgroup.org/t/performance-reading-data-with-non-contiguous-selection/8979 and https://github.com/h5py/h5py/issues/1597
We're going to try the same workaround in read_object
in order to speed up calls when an idx
is provided.
Where do we do indexed reading in the data loader other than in .load()
, when an entry list is used?
the idx read appears to have been a red herring. I found a factor of ~3 speed up for that tcm read step (really: build entry list) in data_loader and another factor of ~2 in lgdo.table.get_dataframe. However the overall read is still a factor of ~3 slower than Patrick's low-level read:
Singularity> python kraus_speed_test.py
Try low level:
time to located files: 0.003 s
time to load meta: 1.050 s
time to load tcm: 1.632 s
time to load data: 9.426 s
Try Data loader:
time to located files: 0.726 s
time to load meta: 0.891 s
time to load tcm: 26.584 s
time to load data: 7.684 s
There is a lot of complexity in data_loader having to do with dealing with multi-level cuts and other generalizations, but I hope it can be sped up more with some refactoring. It looks like there is still a lot going on in the inner loops.
Nice! Performance of the LGDO conversion methods is also a topic for https://github.com/legend-exp/legend-pydataobj/pull/30 by @MoritzNeuberger.
Did anyone try profiling the code to spot straightforward bottlenecks?