nwb-schema
nwb-schema copied to clipboard
Add ExternalImageSeries
The ImageSeries neurodata_type is a subtype of TimeSeries. TimeSeries requires the dataset data; however, ImageSeries makes data optional because external_file may be provided in place of data. This breaks the rules of inheritance -- a child must have all required fields of a parent, and this inconsistency creates challenges in https://github.com/NeurodataWithoutBorders/pynwb/pull/1274
This PR makes ImageSeries.data required and updates the documentation. If external_file is provided, then data should be set to an empty 3D array.
See also discussion here: https://github.com/NeurodataWithoutBorders/pynwb/issues/1220
Seeking your input @oruebel @ajtritt @bendichter
This has always been annoying. Maybe we should just create a separate data type for external videos
This has always been annoying. Maybe we should just create a separate data type for external videos
I am also in favor of this solution and would prefer it. The new data type ExternalImageSeries (?) could inherit from a base Events type https://github.com/rly/ndx-events since it would consist of just timestamps and a dataset of filenames.
he new data type
ExternalImageSeries(?) could inherit from a base Events
I'm fine with creating a separate type, but I don't fully understand why this would be treated as Events? Isn't this just a regular timeseries (with timestamps, data, unit etc.) where the only difference is that the data is a 1D array of strings indicating the file that corresponds to the respective image frame.
he new data type
ExternalImageSeries(?) could inherit from a base EventsI'm fine with creating a separate type, but I don't fully understand why this would be treated as Events? Isn't this just a regular timeseries (with timestamps, data, unit etc.) where the only difference is that the
datais a 1D array of strings indicating the file that corresponds to the respective image frame.
So if a single tiff file is the external image file for 300 timestamps, then data would just be an array consisting of that filename string 300 times?
That would be fine with me. It uses a bit more space than it needs to, but it is clear and explicit.
Do we need to replicate the filename 300 times? That's even worse than the current solution.
need to test creating an empty array in MatNWB
After discussion with @bendichter, we decided to:
- Add a new type
ExternalImageSerieswhich inherits fromNWBDataInterface. It contains a dataset of external file paths and an attribute indicating the frame within the image series at which each external file begins (copied from the oldImageSeries, but I improved the documentation).- Q: Unlike all other types that end in
Series, this is not aTimeSeries. I am OK with this exception, but it should be noted. What do you think? - This does not inherit from the
Eventstype because we want to allow users to specify a starting time and sampling rate, which theEventstype does not support. We could add that option for theEventstype, but it does not seem useful for the vast majority of use cases of theEventstype. Thus, this type inherits fromNWBDataInterfaceand has both a timestamps dataset and a starting_time dataset, just likeTimeSeries.
- Q: Unlike all other types that end in
- Make
ImageSeries/datarequired - Discourage the use of
ImageSeries/external_fileand settingImageSeries/formatto something other than the default value of 'raw' with the long-term goal of deprecating it.
@bendichter How are labs using storing time series of multi-channel image data in the NWB file, if at all? According to the dims and doc for ImageSeries, the data of an ImageSeries should either be N single-channel images (e.g., grayscale) or N single-channel volumes. Is anyone storing N multi-channel images (e.g., RGB or RGBA) in an ImageSeries? If so, that is not the documented use of this type, and it would not be possible for a user or machine to distinguish between a volume and a multi-channel image. AFAIK, there is no type that supports storing N multi-channel images, except perhaps IndexSeries.
Would it be better to have an ImageSeries type for images and a VolumeSeries type for volumes?
See also https://github.com/NeurodataWithoutBorders/nwb-schema/issues/316 where we changed OpticalSeries to have dimensions (frame, x, y, rgb)
We could call it an ExternalImageStack if we want to maintain consistency that a "Series" is always a TimeSeries
Also worth noting with this change is that all types that extend ImageSeries and use external files will need to have a corresponding new External type. E.g. ExternalTwoPhotonSeries. That's doable but not ideal...
To summarize this issue and PR:
Problem:
The ImageSeries neurodata_type is a subtype of TimeSeries. TimeSeries requires the dataset 'data'; however, ImageSeries makes 'data' optional because 'external_file' may be provided in place of data. This breaks the rules of inheritance -- a child must have all required fields of a parent, and this inconsistency creates challenges in the API and validation.
The main use cases of ImageSeries (AFAIK) are:
- user wants to store a time series of images acquired from the brain (e.g., two-photon images, which are often single multi-image/page TIFF files or an ordered set of single-image image files), either in the original image format to maximize interoperability with other tools or embedded as raw values in the file to maximize portability
- user wants to store a time series of image stimuli presented to the animal, either in the original image format to maximize interoperability with other tools or embedded as raw values in the file to maximize portability
- user wants to store video recordings of the animal / environment, in the original video format (e.g., MP4) because HDF5 video compression is poor
- user/developer wants to display images acquired from the brain, video recordings, or image stimuli individually or as an animated movie
- user/developer wants to perform computation on the images individually or all together (e.g., mean, filter, ROI extraction, pose estimation)
In the majority of cases, the user wants to store the raw image/video data outside of the NWB file to maximize interoperability with other tools.
Some possible solutions:
Option 1: Make ImageSeries/data required. If 'external_file' is provided, then 'data' should be set to an empty 3D array.
Pros:
- Space-efficient
Cons:
- If a starting time & rate are provided, then there is no way to know how many images were presented without counting the number of frames of the individual external files.
Option 2: Make ImageSeries/data required. If 'external_file' is provided, then 'data' should be set to a 1D array with the same length as timestamps where each element is the filename of the external file corresponding to that time. For example, if 'external_file' has 1 file with 300 frames, then 'data' is a string array containing 'filename.tiff' repeated 300 times. If 'external_file' has 10 files each with 1 frame, then 'data' has the 10 filenames, i.e., the same as 'external_file'.
Pros:
- Clear format
Cons:
- A little harder to enter the data correctly (e.g., in MatNWB)
- Not as space-efficient but usually there aren't that many image files
Option 3: Make ImageSeries/data required. Aim to deprecate the use of ImageSeries and types that extend it for storing external image files. Create a new type ExternalImageSeries and corresponding types that extend it for storing external image files. Other types that link to an ImageSeries will now also need to link to an ExternalImageSeries (or a new common base type).
Pros:
- Clear delineation between the two types of image storing methods
- No rules within
ImageSeriesor new types that require understanding the documentation (e.g., if field1 is x, then field2 is required and field3 should be set this way) -- these are confusing and not easily validatable - Space-efficient
Cons:
- More types, confusing
Option 4: Make new ImageObjectSeries where 'data' is a dataset of references to Image or ExternalImage objects (is reftype: A or B possible? or do we need to make a BaseImage type?). Add new ExternalImage object that is a reference to a single file and an optional index into that file for multi-image files. Aim to deprecate the use of ImageSeries in favor of this type.
Pros:
- Reduces duplication of how image data are represented across types
Cons:
- Datasets of references are hard to inspect outside of the APIs
r How are labs using storing time series of multi-channel image data in the NWB file, if at all? According to the
dimsanddocforImageSeries, the data of anImageSeriesshould either be N single-channel images (e.g., grayscale) or N single-channel volumes. Is anyone storing N multi-channel images (e.g., RGB or RGBA) in anImageSeries? If so, that is not the documented use of this type, and it would not be possible for a user or machine to distinguish between a volume and a multi-channel image. AFAIK, there is no type that supports storing N multi-channel images, except perhapsIndexSeries.Would it be better to have an
ImageSeriestype for images and aVolumeSeriestype for volumes?
Labs with multiple channels are storing them as multiple ImageSeries or TwoPhotonSeries objects
After discussion with @oruebel, we have decided to go with Option 1 for now and put the other changes in this PR regarding handling of external files on hold for NWB 2.5.0. This will give us more time to solicit input from relevant parties and decide on the best course of action.
Additionally, we had a few more thoughts:
Option 5. Make ImageSeries/data required. Change external_file to be a list of filenames, one per timestamp. If external_file is provided, then data should be a 1-D array of frame numbers of the corresponding file in external_file.
Option 4 is not ideal for image data stored in the file because it could result in 100s or 1000s of Image objects stored non-contiguously with each other. This results in inefficient data access for reading a stack, computing across images in a stack, and compressing image data in a stack.
Option 3 is not ideal because all extensions of ImageSeries would have to define an extension of ExternalImageSeries.