suite2p
suite2p copied to clipboard
BUG: Can not handle an h5 file with 5D data structure
Describe the issue:
When suite2p
try to open an h5 file with 5D data structure, it can not iterate through frames with this error.
TypeError: Indexing arrays must have integer dtypes
The cause of the error is the fact that arguments of np.arange
receives int / int
when hdims != 3
.
Reproduce the code example:
# needs an h5 file with 5D array structure.
Error message:
Traceback (most recent call last):
File "/home/jihoon_jeong/miniforge3/envs/2p_prep/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3550, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-2-5c404d568d0f>", line 1, in <module>
f[key][irange, ...]
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/home/jihoon_jeong/miniforge3/envs/2p_prep/lib/python3.9/site-packages/h5py/_hl/dataset.py", line 831, in __getitem__
selection = sel.select(self.shape, args, dataset=self)
File "/home/jihoon_jeong/miniforge3/envs/2p_prep/lib/python3.9/site-packages/h5py/_hl/selections.py", line 82, in select
return selector.make_selection(args)
File "h5py/_selector.pyx", line 282, in h5py._selector.Selector.make_selection
File "h5py/_selector.pyx", line 197, in h5py._selector.Selector.apply_args
TypeError: Indexing arrays must have integer dtypes
Version information:
0.14.0
Context for the issue:
No response
Double value in np.arange was not the only problem.
Fix 1
In line 63-64, nframes_all
which I think a variable representing total number of frames, only consider the first two dimension. This works fine with 4D, but when 5D data is provided,
the algorithm must consider number of frames * number of planes * number of channels
nframes_all = f[key].shape[0] if hdims == 3 else f[key].shape[0] * f[key].shape[1]
Fix 2
I don't know why there is this comment data = (nchan x) (nframes x) nplanes x pixels x pixels
If that comment is right, than...
- both nframes x nplanes x pixels x pixels and nchan x nframes x pixels x pixels should work for 4D data,
- and 5D data with nchan x nframes x nplanes x pixels x pixels
but the second case of the 4D and the 5D fail because of the line 81
im = f[key][irange, ...]
It would be much safer to check the shape of the data and raise error when the data shape didn't match with ops.npy
.
Same thing applies to line 82-83
Fix 3
I don't understand the logic in line 97 i0 = nchannels * ((j) % nplanes)
as ((j) % nplanes)
is always j
For the 5D, don't we want something like this?
[fr0 ch0 pl0], [fr0 ch0 pl1], [fr0 ch1 pl0],[fr0 ch1 pl1] [fr1 ch0 pl0], ...
In this case, i0 = j
will do the job as it will move one index further for each plane.
Then line 98-99 and line 104-105 should be changed too.
Minor
line 87 if type(im[0, 0, 0]) == np.uint16:
would be much direct if im.dtype == np.uint16:
Please correct me if I'm wrong. BTW, it's an awesome package! Thank you for your job!
Thanks @knowblesse for noticing this and trying to fix this issue. We don't use h5 files ourselves, but I think this deinterleaving of irange is necessary for scanbox converted files? I believe these files had data arrays which were/are 3D/4D which needed to be converted to 5D format to put into our binaries (tiffs have this sort of ordering too from Scanimage). But I honestly don't remember who we added this specific 4D/5D fix for.
If possible, ideally the fix would be backwards compatible, e.g. for Fix 1,
nframes_all = np.prod(np.array(f[key].shape[:-2]))
For Fix 2 -- check if the h5 file has the right dimensions -- if so, skip all of the deinterleaving steps and write straight to binary?
I'm not sure if Fix 3 means there is a bug for these sorts of files already anyway?
Hi Dr. Stringer, I'm Ji Hoon, and thank you for your comment! Your feedback is truly an honor! Our lab, led by PI Tristan Geiller at Yale, has recently embarked on building our own data processing pipeline. During this process, we had the pleasure of discovering your remarkable package. After careful consideration, we decided to adopt the H5 format with a 5D array over the TIFF stack option.
I actually fixed the issue, and tested on 3D, 4D (with multiple channels), 4D (with multiple planes), and 5D (both multiple channels and planes). I've submitted a pull request (#1112), and I would greatly appreciate your review.
- While I cannot guarantee compatibility with the "scanbox file" as I've never used it, I've ensured functionality for H5 files. Additionally, I've created a test dataset labeled with each frame's information, which I'm happy to share.
- For issue 1, I've implemented the exact code as suggested.
- Regarding issue 2, I've included additional error-checking scripts in the commit. However, if you prefer a cleaner codebase without the messy sanity checks, I can remove them.
- As for issue 3, the original code won't save the frame properly for the 5D data in current state. I recommend testing this behavior with the sample data I've used, as flattening arrays can often be a source of confusion.
Thank you!