pynwb icon indicating copy to clipboard operation
pynwb copied to clipboard

[Bug]: Timeseries can have timestamps + data that don't match in length

Open jonahpearl opened this issue 1 year ago • 4 comments

What happened?

I use the convenient timeseries pointer feature, where you can just store timestamps in one timeseries and have many others point to them. This saves a lot of memory! However, I just fixed a bug that arose because some data I stored in a timeseries didn't have the same length as the timeseries (due to a processing mistake). Then I went and checked, and it turns out that timeseries' timestamps don't need to have the same length as the data at all! I consider this a bug, given that timeseries are meant as a core unit of data in NWB, but let me know if I'm just using them wrong.

Steps to Reproduce

# normal case
t = nwb.TimeSeries(name='test', data=np.arange(10), timestamps=np.arange(10), unit='sec')

# I expect a value error here
nwb.TimeSeries(name='test', data=np.arange(10), timestamps=np.arange(5), unit='sec')

# This similarly should raise an error imo
s = nwb.TimeSeries(name='test2', data=np.arange(2), timestamps=t, unit='sec')

Traceback

n/a

Operating System

macOS

Python Executable

Conda

Python Version

3.9

Package Versions

nwb 2.0.0

Code of Conduct

jonahpearl avatar Aug 11 '22 18:08 jonahpearl

Hey there,

When running the latest version of PyNWB, (v2.1.0 - try pip install -U pynwb to update), you should see a warning along the lines of

UserWarning: <Object type> '<Object name>': Length of data does not match length of timestamps. Your data may be transposed. Time should be on the 0th dimension.

whenever such a size mismatch occurs.

As for why it's only a warning instead of an error, I believe the reasoning was to not break back-compatibility (is that right @lry @bendichter @oruebel?) or otherwise be too strict, as it can be a commonly observed mistake to unintentionally transpose the dimensions of a data field.

This is also a CRITICAL (highest-level) check implemented in the companion tool, the NWB Inspector (check implemented here). Though again I think the philosophy here is for something like this to not be an error that prevents the creation of the NWBFile, but rather a post-production quality check on the written file.

CodyCBakerPhD avatar Aug 11 '22 18:08 CodyCBakerPhD

As for why it's only a warning instead of an error, I believe the reasoning was to not break back-compatibility (is that right

Yes, that is correct. If we had made it an error then read of files that have this issue would not work due to the ValueError being triggered on read.

However, as of the most recent HDMF release, we now have the ability to detect in __init__ of Container classe (e.g,. TimeSeries) whether we are in construct mode (i.e., automatic construction during read) or not. This was implemented in https://github.com/hdmf-dev/hdmf/pull/751 . #1516 shows how this can be used in PynWB to implement checks in a way that you warn when reading from file and raise an error when constructing new data. Essentially, what it boils down to in most cases is that you simply call self._error_on_new_warn_on_construct(error_msg="my message") in __init__, which will either: 1) raise a ValueError when in user mode, 2) raise a warning when in construct mode (i.e., read from file), or 3) do nothing if error_msg=None is passed. In this way you can implement check function to either return a string or None and then simply call self._error_on_new_warn_on_construct(error_msg=self._check_timestamps_valid()) in __init__. If you need more complex behavior then you can also check if self._in_construct_mode: directly to distinguis between behavior during constuction on read and new creation by the user.

Long story short, with this, the check whether timestamps have the correct length could be updated to raise an error in user mode and still preserve backward compatibility by only raising a warning when reading an invalid file.

CC @weiglszonja since she implement this behavior for ImageSeries

oruebel avatar Aug 11 '22 19:08 oruebel

Got it, thank you both! I will update my version to get my desired behavior as a warning for now, which will at least alert me to it.

On Aug 11, 2022, at 3:28 PM, Oliver Ruebel @.***> wrote:

As for why it's only a warning instead of an error, I believe the reasoning was to not break back-compatibility (is that right

Yes, that is correct. If we had made it an error then read of files that have this issue would not work due to the ValueError being triggered on read.

However, as of the most recent HDFM release, we now have the ability to detect in init of Container classe (e.g,. TimeSeries) whether we are in construct mode (i.e., automatic construction during read) or not. This was implemented in hdmf-dev/hdmf#751 https://github.com/hdmf-dev/hdmf/pull/751 . The following #1516 https://github.com/NeurodataWithoutBorders/pynwb/pull/1516 shows how this can be used to implement checks in a way that you warn when reading from file and raise an error when constructing new data. Essentially, what it boils down in most cases is that you simply call self._error_on_new_warn_on_construct(error_msg="my message") in init, which will either: 1) raise a ValueError when in user mode, 2) raise a warning when in construct mode (i.e., read from file), or 3) do nothing if error_msg=None is passed. In this way you can implement check function to either return a string or None and then simpy call self._error_on_new_warn_on_construct(error_msg=self._check_timestamps_valid()) in init. If you need more complex behavior then you can also check if self._in_construct_mode: directly to distinguis between behavior during constuction on read and new creation by the user.

Long story short, with this, the check whether timestamps have the correct length could be updated to raise an error in user mode and still preserve backward compatibility by only raising a warning when reading an invalid file.

— Reply to this email directly, view it on GitHub https://github.com/NeurodataWithoutBorders/pynwb/issues/1536#issuecomment-1212399057, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQKOLZCCDVG3E5FXHSWYPGDVYVH4NANCNFSM56JF4MRA. You are receiving this because you authored the thread.

jonahpearl avatar Aug 11 '22 20:08 jonahpearl

Thank you for raising this, I started working on a PR that should fix this behavior.

weiglszonja avatar Aug 11 '22 21:08 weiglszonja