pynwb Q: is there a way to speed up reading of small set of attributes from .nwb file

NB Filing against pynwb since operating on its constructs, not against hdmf although it seems that all the time is spent by it

ATM a script like

import sys
from time import time 
from pynwb import NWBHDF5IO

t0 = time()
for path in sys.argv[1:]:
      with NWBHDF5IO(path, "r") as io:
        nwb = io.read()
        print(path, nwb.identifier)
print("Took %.3f seconds to read a single attribute from %d files" % (time() - t0, len(sys.argv[1:])))

would signal that it

$> python ../trash/quick_attr.py /home/yoh/proj/dandi/nwb-datasets/najafi-2018-nwb/data/FN_dataSharing/nwb/mouse1_fni16_150817_001_ch2-PnevPanResults-170808-190057.nwb
/home/yoh/proj/dandi/nwb-datasets/najafi-2018-nwb/data/FN_dataSharing/nwb/mouse1_fni16_150817_001_ch2-PnevPanResults-170808-190057.nwb 150817_001_ch2-PnevPanResults-170808-190057
Took 2.685 seconds to read a single attribute from 1 files

whenever analogous code via direct use of h5py would take ~1 ms.

$> cat ../trash/quick_attr_h5py.py
import sys
import h5py
from time import time 

t0 = time()
for path in sys.argv[1:]:
      with h5py.File(path, "r") as h5file:
        print(path, h5file["identifier"].value)
print("Took %.3f seconds to read a single attribute from %d files" % (time() - t0, len(sys.argv[1:])))

$> python ../trash/quick_attr_h5py.py /home/yoh/proj/dandi/nwb-datasets/najafi-2018-nwb/data/FN_dataSharing/nwb/mouse1_fni16_150817_001_ch2-PnevPanResults-170808-190057.nwb
../trash/quick_attr_h5py.py:8: H5pyDeprecationWarning: dataset.value has been deprecated. Use dataset[()] instead.
  print(path, h5file["identifier"].value)
/home/yoh/proj/dandi/nwb-datasets/najafi-2018-nwb/data/FN_dataSharing/nwb/mouse1_fni16_150817_001_ch2-PnevPanResults-170808-190057.nwb 150817_001_ch2-PnevPanResults-170808-19
Took 0.001 seconds to read a single attribute from 1 files

I wondered if there is anything what could be done on pynwb/hdmf side (may be there is some options, or my code construct above is not what I should have done, etc) to speed up use in cases where no heavy loading of data, but getting only basic structures/attributes is needed; or for such cases it would be recommended to resort to direct use of h5py in such cases?

Thank you in advance for the guidance! ref: https://github.com/dandi/dandi-cli/issues/10

Dec 03 '19 21:12 yarikoptic

Loading the pynwb module is pretty slow at the moment. A lot of time is currently spent parsing the yaml schema files. @ajtritt has looked into ways of pickling the TypeMap to speed up the initial load. He may be able to offer some guidance.

Dec 03 '19 22:12 rly

Actually, never mind. I see that you are timing just the file reading part of your sample code. Reading the file involves building all of the classes and objects behind the scenes. It might be nice to have a feature that reads a single attribute or dataset from an NWB file, e.g., id = NWBHDF5IO(path, "r", path="/identifier"). That seems tricky to do efficiently. I would probably use h5py for the time being.

Dec 03 '19 22:12 rly

nice to have a feature that reads a single attribute or dataset from an NWB

I think for PyNWB it may make more sense to stick to the single-container (i.e., single neurodata_type) level if you are thinking about partial read.

Dec 03 '19 22:12 oruebel

nice to have a feature that reads a single attribute or dataset from an NWB

I think for PyNWB it may make more sense to stick to the single-container (i.e., single neurodata_type) level if you are thinking about partial read.

Unfortunately that does not resolve the MWE situation where the attribute is on NWBFile.

Dec 03 '19 22:12 rly