movement icon indicating copy to clipboard operation
movement copied to clipboard

Added Support for loading time co-ordinates

Open TomatoChocolate12 opened this issue 9 months ago • 16 comments

Before submitting a pull request (PR), please read the contributing guide.

Please fill out as much of this template as you can, but if you have any problems or questions, just leave a comment and we will help out :)

Description

What is this PR

  • [ ] Bug fix
  • [x] Addition of a new feature
  • [ ] Other

Why is this PR needed? This PR fixes issue #473

What does this PR do? This PR adds support for loading time co-ordinates in the form of a vector.

References

Please reference any existing issues/PRs that relate to this PR.

Issue #473

How has this PR been tested?

Please explain how any new code has been tested, and how you have ensured that no existing functionality has changed.

I have run all the pre-commit hooks and so far there seems to be no issues. I have not added validators for the new timesteps argument though. Also, I have not added any new tests for this enhancement.

Is this a breaking change?

If this PR breaks any existing functionality, please explain how and why.

Does this PR require an update to the documentation?

If any features have changed, or have been added. Please explain how the documentation has been updated.

Checklist:

  • [ ] The code has been tested locally
  • [ ] Tests have been added to cover all new functionality
  • [x] The documentation has been updated to reflect any changes
  • [x] The code has been formatted with pre-commit

TomatoChocolate12 avatar Mar 07 '25 23:03 TomatoChocolate12

hello maintainers!

from my understanding of the issue, currently the time-coordinates were being generated with the help of fps, but as there is a chance that there could be irregularity with the frames, this method is not always correct. so, it was necessary functionality to let the user add a vector of timestamps as an input.

i have added a new parameter to the ValidPosesDataset and ValidBboxesDataset classes which signifies timestamps. I have updated the documentation accordingly and also changed the functionality of _ds_from_valid_data to make the changes.

I have not added any tests for this so far, and will do so on review by the maintainers. Also, I have not added validators for the timestamp vector as I was not sure if it was necessary.

TomatoChocolate12 avatar Mar 07 '25 23:03 TomatoChocolate12

Codecov Report

Attention: Patch coverage is 60.00000% with 4 lines in your changes missing coverage. Please review.

Project coverage is 99.05%. Comparing base (e808116) to head (e355f10). Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
movement/io/load_bboxes.py 50.00% 2 Missing :warning:
movement/io/load_poses.py 50.00% 2 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #479      +/-   ##
==========================================
- Coverage   99.32%   99.05%   -0.27%     
==========================================
  Files          28       28              
  Lines        1477     1485       +8     
==========================================
+ Hits         1467     1471       +4     
- Misses         10       14       +4     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Mar 10 '25 11:03 codecov[bot]

Hi @TomatoChocolate12, thanks for opening this PR!

From my understanding of the issue, the time coordinates were previously generated using fps. However, since there could be irregularities in the frames, this method is not always reliable. It was therefore necessary to introduce functionality that allows the user to provide a vector of timestamps as an input.

That’s precisely the issue, yes.

I have added a new timestamps parameter to the ValidPosesDataset and ValidBboxesDataset classes. I have also updated the documentation accordingly and modified the _ds_from_valid_data function to implement these changes.

Starting with the validators is a good approach. I agree with the general logic you've implemented—if the user provides timestamps, they should take precedence over fps. The priority for assigning time coordinates should be:

  1. Directly assign the provided timestamps. We need to ensure that they are sequentially increasing and that their length matches the length of the time dimension in the data (these checks should be included in the validators).
  2. If no timestamps are provided, derive time coordinates from the given fps (as is currently done).
  3. If neither fps nor timestamps are available, use ascending integers (frame indices) as time coordinates (as is currently done).

That said, implementing this would require additional work beyond modifying the validators and _ds_from_valid_data:

  • We need to expose the timestamps argument in the public functions of the load_poses submodule, as these are the actual user-facing functions where timestamps would be supplied.
  • We should consider which formats to support for timestamps. For example, should we accept only an array of floats (in seconds), or should we also support datetime formats? This requires some research into existing timestamp conventions.

If you'd like to explore this further, I’d suggest thinking about these two questions and proposing a solution within this PR. It would also be helpful to update the PR description by replacing the placeholder text from our template with a summary of what you've implemented.

Thanks again for your work on this!

niksirbi avatar Mar 10 '25 11:03 niksirbi

hello @niksirbi , thanks for the review.

i added timestamps to all the user-facing functions and also added validators to check timestamps.

about the two questions that you posed, i would propose to ensure that the timestamps are in the millisecond format, but should be positively responsive to datetime input as well. We could always just convert the datetime values to milliseconds. I would assume that as the library is working with animal movement videos, the videos will likely not extend more than a couple of hours max. also, if the library wants to analyse the flapping of an insects' wing, etc. it would need timestamps in millisecond values.

so, this would be my proposed solution:

timestamps accepts both datetime and numerical (positive values). the numerical values are assumed to be in milliseconds. the datetime value could be converted to milliseconds within the code.

TomatoChocolate12 avatar Mar 12 '25 13:03 TomatoChocolate12

Thanks @TomatoChocolate12. I would have to over this PR again, and think about the milliseconds + datetime support. I will get back to you on this.

niksirbi avatar Mar 12 '25 18:03 niksirbi

I have a different but related question. It would be nice to add support for date+time (as in, actually date of the day and time for every frame). In my use-case (and probably in other cases), the videos (and files) contains the timestamp the video was started. It would be nice to have, in addition to the time, the datetime as a coord.

I tried to do so myself (by replacing time with datetime) but got into issues with the kinematics calculations and lates with the resampling functions.

The way I am doing it right now is like so:

    datetime_vector = pd.date_range(
        start=recording_datetime,
        periods=len(combined_container.coords['time']),
        freq=pd.DateOffset(seconds=1/fps)
    )
    
    combined_container['datetime'] = ('time', datetime_vector)

I can continue doing this but I was wondering if its something that should be implemented in the future.

hummuscience avatar Apr 01 '25 13:04 hummuscience

Thanks for the input, @hummuscience, this is a really valuable point.

Yes, we definitely want to support proper datetime handling in the future. It will be particularly important for synchronising behavioural data with other time-aligned sources like neurophysiological recordings. That said, it’s a non-trivial change and will likely require several PRs to fully implement. This current PR is just the first step in that direction.

We’ve started thinking about the best way to support datetime data in movement. Here’s a rough (and still evolving) roadmap:

  • At the moment, we represent elapsed time in seconds from the start of the recording, and we assume a constant frame rate (fps).

  • The next step, which we’re hoping to achieve in this PR, is to allow passing a custom array of (elapsed) time-stamps, removing the constant frame rate assumption.

  • From there, we want to enable mapping to actual calendar datetimes. There are a few possible approaches:

    1. Store only elapsed time as the time coordinate, but add a start_datetime field in the dataset metadata. This would allow us to compute the actual datetime for any frame if needed. We could provide convenience methods for converting between the two. From your code snippet, it looks like you’re doing something very similar, so I hope we’re on the same page here.
    2. Provide both elapsed time and datetime as coordinate labels for the time dimension. This would allow indexing based on either representation while keeping existing functionality (which relies on elapsed time) intact. I'm not entirely sure if xarray supports multiple coordinate labels for the same dimension in a clean way — this needs further exploration.
    3. Fully convert the time coordinate to datetimes and use that as the standard. Personally, I’d be hesitant about this: while it may make sense in some domains, it adds a lot of complexity, especially for users who don’t need datetimes and just want simple elapsed seconds. Floats are simpler to work with, and many applications don’t require absolute time.

So in short, yes, datetime support is definitely on the roadmap, but we want to implement it in a way that’s both flexible and minimally disruptive. Your use case and example are really helpful in guiding that design, so feel free to further clarify.

niksirbi avatar Apr 01 '25 14:04 niksirbi

I think option 1 is the easiest for now. Maybe even just as an example in one of the tutorials, since I am sure someone will actually stumble onto the same problem.

Option 3 got me into some weird issues at a later point, so yeah, that would be problematic.

Another thing that perhaps goes with this is concatenation of multiple videos. In my use-case, my setup produces 15-minute long videos with a timestamp (to avoid frame skips when camera buffers are full). Possibly too specific of a use-case, but I also had to deal with correct orders of the movement objects and correcting the ~time~ coordinate so it's correct when concatenated.

hummuscience avatar Apr 01 '25 14:04 hummuscience

I think option 1 is the easiest for now. Maybe even just as an example in one of the tutorials, since I am sure someone will actually stumble onto the same problem.

Starting with an example is a good idea. Would you be willing to have a go at contributing an example (could be loosely based on your workflow)? No pressure, of course.

Another thing that perhaps goes with this is concatenation of multiple videos. In my use-case, my setup produces 15-minute long videos with a timestamp (to avoid frame skips when camera buffers are full). Possibly too specific of a use-case, but I also had to deal with correct orders of the movement objects and correcting the ~time~ coordinate so it's correct when concatenated.

Not a specific use-case at all, many labs do that, for the same reason as you. So basically, you had to time-shift each 15-min data segment, such that the time axis makes sense after concatenation? If so, may be also a good addition for the aforementioned example.

niksirbi avatar Apr 01 '25 14:04 niksirbi

Alright, I will put in an example of how I am doing this :)

hummuscience avatar Apr 01 '25 16:04 hummuscience

Quick question. Is there some sample data with consecutive files?

hummuscience avatar Apr 01 '25 16:04 hummuscience

Quick question. Is there some sample data with consecutive files?

Hmm, no we don't have that. I'm wondering how we could go around this. We could add more sample data like this, but we also want to keep the sample data files small (they are fetched during CI, and we don't want that to slow us down). We could simulate the situation by splitting an existing sample dataset into "chunks" and then re-assembling them, but the example would feel a bit contrived if we first implement the split ourselves and then undo it.

Perhaps we do want to add such sample data then. But to keep things small, maybe just 2 files, and only the predicted pose tracks (no videos). Would you have such a pair of files to share, in one of our supported formats?

niksirbi avatar Apr 01 '25 19:04 niksirbi

I think I can supply two/three files (to also demonstrate ordering and correcting the timestamps).

hummuscience avatar Apr 02 '25 08:04 hummuscience

I think I can supply two/three files (to also demonstrate ordering and correcting the timestamps).

Perfect! Feel free to share them with me any way that's convenient to you, including a short description of what's inside. I can have a look and upload them to our data repository.

niksirbi avatar Apr 02 '25 08:04 niksirbi

Putting this here for future reference (feel free to delete and put where necessary).

using the scale (movement.transform) on an xArray that has datetime as a data variable, fails:

Traceback (most recent call last):
  File "<string>", line 17, in __PYTHON_EL_eval
  File "/Volumes/projects/OWpsilocybin/code/in_roi.py", line 17, in <module>
    'mid_backend', 'mid_backend2', 'mid_backend3', 'tail_base', 'left_shoulder',
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/abdelhaym/.conda/envs/movement-env/lib/python3.12/site-packages/movement/transforms.py", line 66, in scale
    scaled_data = data * factor
                  ~~~~~^~~~~~~~
  File "/Users/abdelhaym/.conda/envs/movement-env/lib/python3.12/site-packages/xarray/core/_typed_ops.py", line 223, in __mul__
    return self._binary_op(other, operator.mul)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/abdelhaym/.conda/envs/movement-env/lib/python3.12/site-packages/xarray/core/dataset.py", line 7581, in _binary_op
    ds = self._calculate_binary_op(g, other, join=align_type)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/abdelhaym/.conda/envs/movement-env/lib/python3.12/site-packages/xarray/core/dataset.py", line 7650, in _calculate_binary_op
    new_vars = {k: f(self.variables[k], other_variable) for k in self.data_vars}
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/abdelhaym/.conda/envs/movement-env/lib/python3.12/site-packages/xarray/core/_typed_ops.py", line 935, in __mul__
    return self._binary_op(other, operator.mul)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/abdelhaym/.conda/envs/movement-env/lib/python3.12/site-packages/xarray/core/variable.py", line 2331, in _binary_op
    f(self_data, other_data) if not reflexive else f(other_data, self_data)
    ^^^^^^^^^^^^^^^^^^^^^^^^
numpy._core._exceptions._UFuncBinaryResolutionError: ufunc 'multiply' cannot use operands with types dtype('<M8[ns]') and dtype('float64')

hummuscience avatar Apr 10 '25 14:04 hummuscience

Hey @TomatoChocolate12, apologies for the long silence on this. I’ve been quite busy and also spent some time reading up on datetime data types and exploring timestamp functionality, mainly because I needed to figure things out for a different project.

I think I now have a better grasp of the problem and what’s needed for a workable solution in movement. We also discussed this PR today in the community call.

In short, I no longer think we should continue down this path and would instead like to suggest an alternative approach.

Adding timestamps as an argument to the public loader functions (as you’ve done here) is likely to get quite complicated over time. As this PR already shows, we need to update multiple functions to support passing timestamps as an argument, plus the corresponding tests. And this is just the simplest case—timestamps as an array of floats in seconds. Realistically, we’d need to handle more complex cases, like timestamps in various datetime formats. Implementing type conversions and deciding what takes precedence (timestamps vs. fps) would make the loading functions overly complex.

Also, keep in mind that the loader functions are often the first part of movement that users encounter. Presenting them with too many options at that stage might be confusing, especially since many users don’t need timestamps functionality at all.

For all these reasons, I’d like to propose a different way forward:

Keep the loader functions unchanged, just as they are now and instead implement the timestamps functionality as a separate module—say, movement.utils.time. This module could offer utilities for:

  • Reading timestamps from a file or array-like object and assigning them as time coordinates to existing xr.Dataset/xr.DataArray objects.
  • Creating datetime timestamps from a known start_datetime and fps.
  • Converting between datetime and “time elapsed” formats, so users can switch back and forth.

Users who need timestamps could:

  • Load datasets as before, using an estimated fps or leaving it as None.
  • Then use the new time utilities to adjust time coordinates as needed (seconds, datetimes, regular or irregular intervals, etc.).

The downside is that users would need to take an extra step instead of everything being handled during data loading. However, this separation keeps the I/O module simpler and allows users to flexibly adjust time representations at any point in their analysis.

I plan to start working on these “time utilities” in a separate branch, building on what I’ve learned from my recent work. There’s still a lot to refine, especially ensuring that all of movement’s existing functionality continues to work seamlessly with datetime indices. As @hummuscience flagged above, some functions will need to be updated to handle this properly, which requires a good grasp of the entire codebase.

I’m sorry we won’t move forward with your proposed implementation. That’s no reflection on your work—I really appreciate your efforts here. Often, it’s only by trying out different approaches that we clarify the best way forward. Your contribution was instrumental in helping us think this through.

I really appreciate your help in getting us this far!

niksirbi avatar May 30 '25 14:05 niksirbi

@niksirbi feel free to keep me updated when you have a new branch testing this. Since this is my general use case (100s of videos) and I would love to help find a good way to deal with this

hummuscience avatar May 30 '25 14:05 hummuscience

@niksirbi feel free to keep me updated when you have a new branch testing this. Since this is my general use case (100s of videos) and I would love to help find a good way to deal with this

Will do. I now got my hands on a similar dataset, with videos saved in 1-hour segments (spanning weeks), so I can develop and debug this functionality.

niksirbi avatar May 30 '25 17:05 niksirbi

Writing here to express a +1 for the need for this and to follow further conversations on timestamp loading

vigji avatar Jun 15 '25 12:06 vigji