mne-python icon indicating copy to clipboard operation
mne-python copied to clipboard

Fix `Annotations` time formats

Open tsbinns opened this issue 10 months ago • 4 comments

Reference issue (if any)

Fixes #13108

What does this implement/fix?

When reading annotations from a csv file, truncates nanoseconds from orig_time by converting to the microsecond format. The exception is for the default time of 1970-01-01 00:00:00 used in the files when Annotations.orig_time=None, as converting this to microseconds would cause it to be read as a proper time rather than being converted to None which is intended.

Also truncates nanoseconds from the times in utils.dataframes._convert_times() when time_format="datetime" by converting to dtype=datetime64[us]. Should this also be done when time_format="timedelta"?

Finally, updates unit tests that check csv files with nanosecond times get assigned proper orig_time when read in and checks that Annotations.to_data_frame() returns times with nanoseconds truncated (which in turn means truncated times are saved).

tsbinns avatar Feb 10 '25 18:02 tsbinns

Also updated the docstring for Annotations.orig_time and added a RuntimeWarning when a string is provided but it gets converted to None. Thoughts on this?

tsbinns avatar Feb 10 '25 18:02 tsbinns

The code for annotations works fine, but changing _convert_times() has some knock on effects for converting data objects to dataframes.

This is an alternative approach to fix the current errors:

elif time_format == "datetime":
    # make ISO8601 (microsecond) compatible
    times = (to_timedelta(times + first_time, unit="s") + meas_date).to_numpy()
    times = DatetimeIndex(
        [time.replace(nanosecond=0) for time in times], tz=times[0].tz
    )

but I'm not 100% on how others want to proceed so I'll pause for now.

tsbinns avatar Feb 10 '25 21:02 tsbinns

Hey @drammock, please could I get your input on the proposed changes. The original annotations issue is resolved, but there are some implementation details that weren't discussed in the issue that I'd like to get suggestions on. That would also help with deciding how I can resolve the issues with dataframe conversions.

tsbinns avatar Mar 10 '25 13:03 tsbinns

The code for annotations works fine, but changing _convert_times() has some knock on effects for converting data objects to dataframes.

This is an alternative approach to fix the current errors:

elif time_format == "datetime":
    # make ISO8601 (microsecond) compatible
    times = (to_timedelta(times + first_time, unit="s") + meas_date).to_numpy()
    times = DatetimeIndex(
        [time.replace(nanosecond=0) for time in times], tz=times[0].tz
    )

but I'm not 100% on how others want to proceed so I'll pause for now.

Turns out this doesn't fully resolve things. Once the proposed changes to _convert_times() are better established I can tackle this properly.

tsbinns avatar Mar 10 '25 13:03 tsbinns

@larsoner I'll have a look through this again tomorrow and refamiliarise myself/start trying to fix the failures. Then will see if I have some more concrete questions then!

tsbinns avatar Nov 05 '25 19:11 tsbinns

Okay, so everything is green now.

This PR changes it so that nanoseconds in times will be dropped on reading annotations. As noted here (https://github.com/mne-tools/mne-python/issues/13108#issuecomment-2648331187), sanitising on read will prevent issues with annotations created with MNE v1.7-1.10, as well as any created moving forward, so I think this could be considered a sufficient solution already.

Still, that comment also mentions sanitising annotations on write by adapting mne.utils.dataframe._convert_times() (which is called when saving annotations) to limit this to microsecond format datetimes. However, _convert_times() is used for dataframe conversion for more than just annotations, so changing it has some knock-on effects, e.g., for Raw -> DataFrame conversion. I have removed the changes that were causing problems, but I still think an argument can be made for sanitising on write.

For example, if I create annotations on MNE 1.11 (when it's released), I might want to share these annotations with colleagues who are using different environments and different MNE versions. If they are using MNE < 1.11, they will encounter the same issues reading these annotations, which would be avoided if we sanitise annotations on write.

Rather than altering _convert_times() and dealing with the wider package implications, we could just have the sanitisation take place in Annotations.to_data_frame(), something like:

    def to_data_frame(self, time_format="datetime"):
        """Export annotations in tabular structure as a pandas DataFrame.
            ...
        """
        pd = _check_pandas_installed(strict=True)
        valid_time_formats = ["ms", "timedelta", "datetime"]
        dt = _handle_meas_date(self.orig_time)
        if dt is None:
            dt = _handle_meas_date(0)
        time_format = _check_time_format(time_format, valid_time_formats, dt)
        dt = dt.replace(tzinfo=None)
        times = _convert_times(self.onset, time_format, dt)
        # ↓↓↓ New code
        if time_format == "datetime":
            # remove nanoseconds
            tz_name = (
                f", {meas_date.tzinfo.tzname(meas_date)}" if meas_date is not None else ""
            )
            times = times.astype(f"datetime64[us{tz_name}]")
        # -------------------
        df = dict(onset=times, duration=self.duration, description=self.description)
        if self._any_ch_names():
            df.update(ch_names=self.ch_names)
        df = pd.DataFrame(df)
        extras_df = pd.DataFrame(self.extras)
        df = pd.concat([df, extras_df], axis=1)
        return df

Alternatively, if people feel datetimes for all dataframes should not contain nanoseconds, I am happy to discuss this, but I feel those changes go beyond the scope of this PR.

WDYT @drammock @larsoner?

tsbinns avatar Nov 06 '25 13:11 tsbinns

Rather than altering _convert_times() and dealing with the wider package implications

Have you considered

def _convert_times(..., *, drop_nano=False):
    ...
    if drop_nano:
        ...

and then only in the places you need it for annots you call with drop_nano=True? Seems like this would make it so we only drop nanoseconds when needed

larsoner avatar Nov 06 '25 15:11 larsoner

Yeah, I can do that. But at the moment we would only set drop_nano=True for Annotations -> DataFrame conversions at this time, correct?

tsbinns avatar Nov 06 '25 15:11 tsbinns

Yeah I think so. Though admittedly I do not have my head 100% wrapped around the issue, so happy to go with what you think makes sense @tsbinns , I think you have given this issue (much) more deliberate consideration!

larsoner avatar Nov 06 '25 15:11 larsoner

Okay. I haven't seen any issues reported with nanoseconds in dataframe conversions of other classes, so I'll keep it simple and stick with enabling this only for annotations!

tsbinns avatar Nov 06 '25 15:11 tsbinns

Just pushed a commit that drops nanoseconds only for Annotations -> DataFrame conversions. Should be everything sorted once tests are green.

tsbinns avatar Nov 06 '25 16:11 tsbinns

That's @larsoner's comments addressed.

Here's an example of what the warning about bad orig_time as input to Annotations looks like now:

RuntimeWarning: The format of the `orig_time` string is not
recognised. It must conform to the ISO8601 format with at most
microsecond precision and where the delimiter between date and
time is ' '. Got: 2002-12-03 19:01:11.720100000. Defaulting
`orig_time` to None.

tsbinns avatar Nov 06 '25 16:11 tsbinns