Added Support for loading time co-ordinates
Before submitting a pull request (PR), please read the contributing guide.
Please fill out as much of this template as you can, but if you have any problems or questions, just leave a comment and we will help out :)
Description
What is this PR
- [ ] Bug fix
- [x] Addition of a new feature
- [ ] Other
Why is this PR needed? This PR fixes issue #473
What does this PR do? This PR adds support for loading time co-ordinates in the form of a vector.
References
Please reference any existing issues/PRs that relate to this PR.
Issue #473
How has this PR been tested?
Please explain how any new code has been tested, and how you have ensured that no existing functionality has changed.
I have run all the pre-commit hooks and so far there seems to be no issues. I have not added validators for the new timesteps argument though. Also, I have not added any new tests for this enhancement.
Is this a breaking change?
If this PR breaks any existing functionality, please explain how and why.
Does this PR require an update to the documentation?
If any features have changed, or have been added. Please explain how the documentation has been updated.
Checklist:
- [ ] The code has been tested locally
- [ ] Tests have been added to cover all new functionality
- [x] The documentation has been updated to reflect any changes
- [x] The code has been formatted with pre-commit
hello maintainers!
from my understanding of the issue, currently the time-coordinates were being generated with the help of fps, but as there is a chance that there could be irregularity with the frames, this method is not always correct. so, it was necessary functionality to let the user add a vector of timestamps as an input.
i have added a new parameter to the ValidPosesDataset and ValidBboxesDataset classes which signifies timestamps. I have updated the documentation accordingly and also changed the functionality of _ds_from_valid_data to make the changes.
I have not added any tests for this so far, and will do so on review by the maintainers. Also, I have not added validators for the timestamp vector as I was not sure if it was necessary.
Codecov Report
Attention: Patch coverage is 60.00000% with 4 lines in your changes missing coverage. Please review.
Project coverage is 99.05%. Comparing base (
e808116) to head (e355f10). Report is 1 commits behind head on main.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| movement/io/load_bboxes.py | 50.00% | 2 Missing :warning: |
| movement/io/load_poses.py | 50.00% | 2 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #479 +/- ##
==========================================
- Coverage 99.32% 99.05% -0.27%
==========================================
Files 28 28
Lines 1477 1485 +8
==========================================
+ Hits 1467 1471 +4
- Misses 10 14 +4
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
🚀 New features to boost your workflow:
- ❄ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Hi @TomatoChocolate12, thanks for opening this PR!
From my understanding of the issue, the time coordinates were previously generated using
fps. However, since there could be irregularities in the frames, this method is not always reliable. It was therefore necessary to introduce functionality that allows the user to provide a vector of timestamps as an input.
That’s precisely the issue, yes.
I have added a new
timestampsparameter to theValidPosesDatasetandValidBboxesDatasetclasses. I have also updated the documentation accordingly and modified the_ds_from_valid_datafunction to implement these changes.
Starting with the validators is a good approach. I agree with the general logic you've implemented—if the user provides timestamps, they should take precedence over fps. The priority for assigning time coordinates should be:
- Directly assign the provided
timestamps. We need to ensure that they are sequentially increasing and that their length matches the length of the time dimension in the data (these checks should be included in the validators). - If no
timestampsare provided, derive time coordinates from the givenfps(as is currently done). - If neither
fpsnortimestampsare available, use ascending integers (frame indices) as time coordinates (as is currently done).
That said, implementing this would require additional work beyond modifying the validators and _ds_from_valid_data:
- We need to expose the
timestampsargument in the public functions of theload_posessubmodule, as these are the actual user-facing functions where timestamps would be supplied. - We should consider which formats to support for timestamps. For example, should we accept only an array of floats (in seconds), or should we also support datetime formats? This requires some research into existing timestamp conventions.
If you'd like to explore this further, I’d suggest thinking about these two questions and proposing a solution within this PR. It would also be helpful to update the PR description by replacing the placeholder text from our template with a summary of what you've implemented.
Thanks again for your work on this!
Quality Gate passed
Issues
1 New issue
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
hello @niksirbi , thanks for the review.
i added timestamps to all the user-facing functions and also added validators to check timestamps.
about the two questions that you posed, i would propose to ensure that the timestamps are in the millisecond format, but should be positively responsive to datetime input as well. We could always just convert the datetime values to milliseconds. I would assume that as the library is working with animal movement videos, the videos will likely not extend more than a couple of hours max. also, if the library wants to analyse the flapping of an insects' wing, etc. it would need timestamps in millisecond values.
so, this would be my proposed solution:
timestamps accepts both datetime and numerical (positive values).
the numerical values are assumed to be in milliseconds.
the datetime value could be converted to milliseconds within the code.
Thanks @TomatoChocolate12. I would have to over this PR again, and think about the milliseconds + datetime support. I will get back to you on this.
I have a different but related question. It would be nice to add support for date+time (as in, actually date of the day and time for every frame). In my use-case (and probably in other cases), the videos (and files) contains the timestamp the video was started. It would be nice to have, in addition to the time, the datetime as a coord.
I tried to do so myself (by replacing time with datetime) but got into issues with the kinematics calculations and lates with the resampling functions.
The way I am doing it right now is like so:
datetime_vector = pd.date_range(
start=recording_datetime,
periods=len(combined_container.coords['time']),
freq=pd.DateOffset(seconds=1/fps)
)
combined_container['datetime'] = ('time', datetime_vector)
I can continue doing this but I was wondering if its something that should be implemented in the future.
Thanks for the input, @hummuscience, this is a really valuable point.
Yes, we definitely want to support proper datetime handling in the future. It will be particularly important for synchronising behavioural data with other time-aligned sources like neurophysiological recordings. That said, it’s a non-trivial change and will likely require several PRs to fully implement. This current PR is just the first step in that direction.
We’ve started thinking about the best way to support datetime data in movement. Here’s a rough (and still evolving) roadmap:
-
At the moment, we represent elapsed time in seconds from the start of the recording, and we assume a constant frame rate (fps).
-
The next step, which we’re hoping to achieve in this PR, is to allow passing a custom array of (elapsed) time-stamps, removing the constant frame rate assumption.
-
From there, we want to enable mapping to actual calendar datetimes. There are a few possible approaches:
- Store only elapsed time as the
timecoordinate, but add astart_datetimefield in the dataset metadata. This would allow us to compute the actual datetime for any frame if needed. We could provide convenience methods for converting between the two. From your code snippet, it looks like you’re doing something very similar, so I hope we’re on the same page here. - Provide both elapsed time and datetime as coordinate labels for the
timedimension. This would allow indexing based on either representation while keeping existing functionality (which relies on elapsed time) intact. I'm not entirely sure ifxarraysupports multiple coordinate labels for the same dimension in a clean way — this needs further exploration. - Fully convert the
timecoordinate to datetimes and use that as the standard. Personally, I’d be hesitant about this: while it may make sense in some domains, it adds a lot of complexity, especially for users who don’t need datetimes and just want simple elapsed seconds. Floats are simpler to work with, and many applications don’t require absolute time.
- Store only elapsed time as the
So in short, yes, datetime support is definitely on the roadmap, but we want to implement it in a way that’s both flexible and minimally disruptive. Your use case and example are really helpful in guiding that design, so feel free to further clarify.
I think option 1 is the easiest for now. Maybe even just as an example in one of the tutorials, since I am sure someone will actually stumble onto the same problem.
Option 3 got me into some weird issues at a later point, so yeah, that would be problematic.
Another thing that perhaps goes with this is concatenation of multiple videos. In my use-case, my setup produces 15-minute long videos with a timestamp (to avoid frame skips when camera buffers are full). Possibly too specific of a use-case, but I also had to deal with correct orders of the movement objects and correcting the ~time~ coordinate so it's correct when concatenated.
I think option 1 is the easiest for now. Maybe even just as an example in one of the tutorials, since I am sure someone will actually stumble onto the same problem.
Starting with an example is a good idea. Would you be willing to have a go at contributing an example (could be loosely based on your workflow)? No pressure, of course.
Another thing that perhaps goes with this is concatenation of multiple videos. In my use-case, my setup produces 15-minute long videos with a timestamp (to avoid frame skips when camera buffers are full). Possibly too specific of a use-case, but I also had to deal with correct orders of the movement objects and correcting the ~time~ coordinate so it's correct when concatenated.
Not a specific use-case at all, many labs do that, for the same reason as you. So basically, you had to time-shift each 15-min data segment, such that the time axis makes sense after concatenation? If so, may be also a good addition for the aforementioned example.
Alright, I will put in an example of how I am doing this :)
Quick question. Is there some sample data with consecutive files?
Quick question. Is there some sample data with consecutive files?
Hmm, no we don't have that. I'm wondering how we could go around this. We could add more sample data like this, but we also want to keep the sample data files small (they are fetched during CI, and we don't want that to slow us down). We could simulate the situation by splitting an existing sample dataset into "chunks" and then re-assembling them, but the example would feel a bit contrived if we first implement the split ourselves and then undo it.
Perhaps we do want to add such sample data then. But to keep things small, maybe just 2 files, and only the predicted pose tracks (no videos). Would you have such a pair of files to share, in one of our supported formats?
I think I can supply two/three files (to also demonstrate ordering and correcting the timestamps).
I think I can supply two/three files (to also demonstrate ordering and correcting the timestamps).
Perfect! Feel free to share them with me any way that's convenient to you, including a short description of what's inside. I can have a look and upload them to our data repository.
Putting this here for future reference (feel free to delete and put where necessary).
using the scale (movement.transform) on an xArray that has datetime as a data variable, fails:
Traceback (most recent call last):
File "<string>", line 17, in __PYTHON_EL_eval
File "/Volumes/projects/OWpsilocybin/code/in_roi.py", line 17, in <module>
'mid_backend', 'mid_backend2', 'mid_backend3', 'tail_base', 'left_shoulder',
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/abdelhaym/.conda/envs/movement-env/lib/python3.12/site-packages/movement/transforms.py", line 66, in scale
scaled_data = data * factor
~~~~~^~~~~~~~
File "/Users/abdelhaym/.conda/envs/movement-env/lib/python3.12/site-packages/xarray/core/_typed_ops.py", line 223, in __mul__
return self._binary_op(other, operator.mul)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/abdelhaym/.conda/envs/movement-env/lib/python3.12/site-packages/xarray/core/dataset.py", line 7581, in _binary_op
ds = self._calculate_binary_op(g, other, join=align_type)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/abdelhaym/.conda/envs/movement-env/lib/python3.12/site-packages/xarray/core/dataset.py", line 7650, in _calculate_binary_op
new_vars = {k: f(self.variables[k], other_variable) for k in self.data_vars}
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/abdelhaym/.conda/envs/movement-env/lib/python3.12/site-packages/xarray/core/_typed_ops.py", line 935, in __mul__
return self._binary_op(other, operator.mul)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/abdelhaym/.conda/envs/movement-env/lib/python3.12/site-packages/xarray/core/variable.py", line 2331, in _binary_op
f(self_data, other_data) if not reflexive else f(other_data, self_data)
^^^^^^^^^^^^^^^^^^^^^^^^
numpy._core._exceptions._UFuncBinaryResolutionError: ufunc 'multiply' cannot use operands with types dtype('<M8[ns]') and dtype('float64')
Hey @TomatoChocolate12, apologies for the long silence on this. I’ve been quite busy and also spent some time reading up on datetime data types and exploring timestamp functionality, mainly because I needed to figure things out for a different project.
I think I now have a better grasp of the problem and what’s needed for a workable solution in movement. We also discussed this PR today in the community call.
In short, I no longer think we should continue down this path and would instead like to suggest an alternative approach.
Adding timestamps as an argument to the public loader functions (as you’ve done here) is likely to get quite complicated over time. As this PR already shows, we need to update multiple functions to support passing timestamps as an argument, plus the corresponding tests. And this is just the simplest case—timestamps as an array of floats in seconds. Realistically, we’d need to handle more complex cases, like timestamps in various datetime formats. Implementing type conversions and deciding what takes precedence (timestamps vs. fps) would make the loading functions overly complex.
Also, keep in mind that the loader functions are often the first part of movement that users encounter. Presenting them with too many options at that stage might be confusing, especially since many users don’t need timestamps functionality at all.
For all these reasons, I’d like to propose a different way forward:
Keep the loader functions unchanged, just as they are now and instead implement the timestamps functionality as a separate module—say, movement.utils.time. This module could offer utilities for:
- Reading timestamps from a file or array-like object and assigning them as time coordinates to existing
xr.Dataset/xr.DataArrayobjects. - Creating datetime timestamps from a known
start_datetimeandfps. - Converting between datetime and “time elapsed” formats, so users can switch back and forth.
Users who need timestamps could:
- Load datasets as before, using an estimated
fpsor leaving it asNone. - Then use the new time utilities to adjust
timecoordinates as needed (seconds, datetimes, regular or irregular intervals, etc.).
The downside is that users would need to take an extra step instead of everything being handled during data loading. However, this separation keeps the I/O module simpler and allows users to flexibly adjust time representations at any point in their analysis.
I plan to start working on these “time utilities” in a separate branch, building on what I’ve learned from my recent work. There’s still a lot to refine, especially ensuring that all of movement’s existing functionality continues to work seamlessly with datetime indices. As @hummuscience flagged above, some functions will need to be updated to handle this properly, which requires a good grasp of the entire codebase.
I’m sorry we won’t move forward with your proposed implementation. That’s no reflection on your work—I really appreciate your efforts here. Often, it’s only by trying out different approaches that we clarify the best way forward. Your contribution was instrumental in helping us think this through.
I really appreciate your help in getting us this far!
@niksirbi feel free to keep me updated when you have a new branch testing this. Since this is my general use case (100s of videos) and I would love to help find a good way to deal with this
@niksirbi feel free to keep me updated when you have a new branch testing this. Since this is my general use case (100s of videos) and I would love to help find a good way to deal with this
Will do. I now got my hands on a similar dataset, with videos saved in 1-hour segments (spanning weeks), so I can develop and debug this functionality.
Writing here to express a +1 for the need for this and to follow further conversations on timestamp loading