movement icon indicating copy to clipboard operation
movement copied to clipboard

Support video and frames files associated with sample data

Open niksirbi opened this issue 2 months ago • 2 comments

Description

What is this PR

  • [ ] Bug fix
  • [x] Addition of a new feature
  • [ ] Other

Why is this PR needed? Having video files, and/or frames extracted from those videos, associated with existing sample pose files will greatly facilitate the development and debugging of GUIs, because it would allow us to plot trajectories over a meaningful background, define ROIs etc. See all the issues linked in References.

What does this PR do?

It overhauls the sample_data.py module to allow for the fetching of videos and/or frames alongside the fetching of pose files. All the changes were done in conjunction with changes to the data repository on GIN and should be interpreted together.

Changes to the data repository:

  • added folders for "videos" and "frames", in addition to the existing "poses" folder. Populated these folders with files for which the researcher(s) have given permission to share. Some video/frame file are shared across multiple pose files (because e.g. the same video was analysed with DeepLabCut and SLEAP).
  • the metadata file is now named metadata.yaml. I added new metadata fields, to express the association between pose datasets and videos/frames. Here's an example entry:
- file_name: "SLEAP_three-mice_Aeon_proofread.analysis.h5"
  sha256sum: "82ebd281c406a61536092863bc51d1a5c7c10316275119f7daf01c1ff33eac2a"
  source_software: "SLEAP"
  fps: 50
  species: "mouse"
  number_of_individuals: 3
  shared_by:
    name: "Chang Huan Lo"
    affiliation: "Sainsbury Wellcome Centre, UCL"
  frame:
    file_name: "three-mice_Aeon_frame-5sec.png"
    sha256sum: "889e1bbee6cb23eb6d52820748123579acbd0b2a7265cf72a903dabb7fcc3d1a"
  video:
    file_name: "three-mice_Aeon_video.avi"
    sha256sum: "bc7406442c90467f11a982fd6efd85258ec5ec7748228b245caf0358934f0e7d"
  note: "All labels were proofread (user-defined) and can be considered ground truth. It was exported from the .slp file with the same prefix."
  • added a new convenience script get_sha256_hashes.py which will iterate over all files in poses, videos, and frames and write the results to txt files (poses_hashes.txt, videos_hashes.txt, frames_hashes.txt). It doesn't go all the way to fully automate the generation of metadata.yaml entries, but it is an improvement on previous practices.

Changes to the code repository (this PR):

  • The sample_data.py module now exposes 3 public functions:
    • list_datasets(): returns the filenames of the sample pose files (the one in the poses folder)
    • fetch_dataset_paths(filename): given a filename of a valid pose dataset (one that the above function returns), return a dict of 3 local paths, with keys "poses", "video", "frame". If video or frame is missing, their value is None.
    rom movement.sample_data import fetch_dataset_paths
    aths = fetch_dataset_paths("DLC_single-mouse_EPM.predictions.h5")
    oses_path = paths["poses"]
    rame_path = paths["frame"]
    ideo_path = paths["video"]
    
    • fetch_dataset(filename) : given a filename of a valid pose dataset (one that list_datasets() returns), calls fetch_dataset_paths(filename) and proceed to load the "poses" into a movement dataset. The "video" and "frame" paths do not get loaded (for now), they are simply stored as dataset attributes.
    rom movement.sample_data import fetch_dataset
    s = fetch_dataset("DLC_single-mouse_EPM.predictions.h5")
    rame_path = ds.video_path
    ideo_path = ds.frame_path
    
    This is the function we expect to be most used, and the updated docs reflect that.
  • Tests, docs, and contributing guide have been updated accordingly. The availability of video frames means that we can also add images to the plots in our examples, but I haven't done this here, as it will be part of the big docs re-organisation #70.

References

Closes #38. Closes #121 because the syntax is much less awkward now (with fewer redundancies), and I think there is no longer a clear need for rewriting the sample_data.py module into a class.

Facilitates #105, #49, #50, #48, #164.

How has this PR been tested?

Updated existing tests in test_sample_data.py.

Is this a breaking change?

Yes, the API for fetching sample datasets has changed. This PR need to be merged ahead of any others, because the changes to the GIN data repository have broken CI, and it will remain broken until this is merged.

Does this PR require an update to the documentation?

Yes, I've updated the relevant sections of the docs.

Checklist:

  • [x] The code has been tested locally
  • [x] Tests have been added to cover all new functionality
  • [x] The documentation has been updated to reflect any changes
  • [x] The code has been formatted with pre-commit

niksirbi avatar May 02 '24 14:05 niksirbi