pynwb icon indicating copy to clipboard operation
pynwb copied to clipboard

warning when user write data in the wrong orientation

Open bendichter opened this issue 2 years ago • 6 comments

Briefly describe the needed feature as well as the reasoning behind it

The most common mistake people make when writing NWB files is that they write a time series with time in the second dimension instead of the first. This isn't always possible to detect, but there are a few give-aways.

  1. If they use timestamps, the length of timestamps should match the first dimension.
  2. If they are writing an RoiResponseSeries or an ElectricalSeries, the length of rois and electrodes should match the second dimension.
  3. If the matrix is fat, not tall, technically it is possible that this is what they intended, but very likely it is not so we might want to throw a warning.

I know this was sort of part of dimensional scales, but would it be possible to just add these checks in the appropriate classes? The one exception I can think of is for (1) if you are using an external source, you likely won't have a properly sized data.

bendichter avatar Aug 31 '21 16:08 bendichter

cc @CodyCBakerPhD

bendichter avatar Aug 31 '21 16:08 bendichter

This is a good idea. Yes, ImageSeries will need to be handled specially.

rly avatar Aug 31 '21 19:08 rly

So if we want to handle ImageSeries separately, then we'll need to control whether this is validated. Can we make this a flag of TimeSeries? Then we can have that flag control whether data_shape_validation is called where data_shape_validation is something like

def data_shape_validation(data, rate=None, starting_time=None, timestamps=None):
	if timestamps is not None and not isinstance(data, AbstractDataChunkIterator):
		assert len(data) == len(timestamps), "If timestamps are defined, the length of timestamps should match the first dimension of data."

For ElectricalSeries, we can also have something like

def dimension_validation(data, electrodes, name=None):
	if data.shape[1] != len(electrodes):
		if len(data) == len(electrodes):
			raise ValueError(f"The `data` field of `ElectricalSeries` {name} data is in the wrong orientation. It should be time x channels, not channels x time.")
		else:
			raise ValueError(f"The second dimension of the `data` field of `ElectricalSeries` {name} does not match the length of electrodes. `data.shape` = {data.shape}, len(electrodes) = {len(electrodes)}")

bendichter avatar Aug 31 '21 20:08 bendichter

Just a few thoughts. I think validation of this sort of thing may appear at there different stages in PyNWB:

  1. When creating a container class (or modifying attributes of the container)
  2. On read/write during the object mapping
  3. In the validator when validating files

I know this was sort of part of dimensional scales, but would it be possible to just add these checks in the appropriate classes?

Yes, dimension scales should make this much easier to do in a formal way, where we can define more rigorous checks because the schema provides us with additional structured information about how data should look.

This being said, the schema is about structure of data and so there will likely always be additional checks that one may want to do for specific types to ensure that data is correct. Adding these sort of checks in the Container classes makes sense (e.g., as part of the constructor, setter functions, or just before write). However, rather than on-off, custom checks in each class, I think it may be useful to create a more formal design pattern for this. E.g., we could define a "check_container" method on the base Container class that the child-classes can then implement to define custom checks for the given type. In this way we then also have more flexibility later on to decide if/when checks should be executed. E.g., in this way these checks could also become an optional (configurable) step in the validator.

oruebel avatar Aug 31 '21 20:08 oruebel

A related issue is https://github.com/NeurodataWithoutBorders/matnwb/issues/305 Checking that colnames in DynamicTable matches the names of the VectorData columns. PyNWB autogenerates this dataset so this should not be an issue when creating a file with PyNWB but when reading and existing file that may have been modified, this should be checked.

oruebel avatar Sep 03 '21 16:09 oruebel

This feature request also comes up from @lfrank where his lab found data where the length of electrodes of an ElectricalSeries did not match the second dimension of data. It would be great to add these checks in.

rly avatar Dec 08 '21 21:12 rly