Rationalize video metadata / video frame metadata
The original Schafer code had quite a few items of metadata. We must:
- Inventory these items and load them properly into our classes.
- Decide which of these items are necessary to the processing pipeline
- Decide what kinds of metadata (Schafer metadata + others we come up with) we should demand of incoming video files from the new Brown process, from Geppetto, and from other labs and models.
BasicWorm encapsulates what data is contained in example_contour_and_skeleton_info.mat, which was the Schafer original. This file contains two video frame metadata items:
- is_stage_movement, a boolean array giving for each frame of the video, whether or not the frame had a stage movement (camera is fixed, the worm is on a "stage" and this stage was slightly moved to keep the worm within the camera's field of view)
- is_valid, a boolean array, giving whether the frame was successfully segmented or not.
NormalizedWorm encapsulates what data is contained in example_video_norm_worm.mat, which was the Schafer original. This file contains the following frame metadata items:
- segmentation_status, a string of length n, where n is the number of frames, where s = segmented, f = segmentation failed, m = stage movement, d = dropped frame, n??? = there is reference in some old code to this. After loading we convert this to a numpy array. The field
is_stage_movement = segmentation_status == 'm'is passed for use in calculatinglocomotion.turns. This is its only use for feature calculation. - frame_codes, giving some similar information about frames. It is used only for the feature
posture.coils: specifically, a coil is defined as starting either on a105- (TooFewEnds) or106- (DoubleLengthSide) coded frame, and ends on a1- (Success) coded frame, but must be at leastCOIL_FRAME_THRESHOLDframes in length.
WormFeatures encapsulates what data is contained in example_video_feature_file.mat, which was the Schafer original. This file contains many items of metadata:
(Under the "worm" heading are all the feature data. Everything under the "info" heading is the metadata.)

By the NormalizedWorm point, we have already converted coordinates to microns, so some of this metadata is not needed at this point. ventral_mode is needed to sign (give + or -) several features:
locomotion.velocity, the motion direction.- The amplitude and frequency of
foraging.bends path.curvature
We also have "metadata" hardcoded in various places:
- Jim's Feature Processing Options
- config.py
- Our draft database schema
A way to group this metadata is whether it is describing:
- The experiment (e.g. lab name, timestamp)
- The video of the experiment (e.g. FPS, height, width, microns per pixel)
- A given frame (e.g. segmentation status)
Not clear whether ventral_mode belongs at the experiment level or the frame level, as perhaps in the future we could detect a roll in mid-video.
Also, perhaps we want the experiment metadata schema to be open, to allow for researchers to include whatever extra data they feel they want. We might want to require a few basic columns of information about the experiment like the lab and timestamp though.
I will likely need access to see how the files are stored on the Shafer corpus drive to see what metadata is in the file names, or if it's stored alongside the videos, etc.
segmentation_status is calculated here: https://github.com/JimHokanson/SegwormMatlabClasses/blob/master/+seg_worm/@normalized_worm/normalized_worm.m
@slarson could you please place in our Google Drive folder an example folder from the Schafer corpus drive containing some .avi files along with any other accompanying files so I know what metadata is stored?
Thanks!
OK I'm done with is_stage_movement, is_valid, segmentation_status, frame_codes, and ventral_mode. They are all encapsulated now in the VideoInfo class by commit 9b1b0c3a7c8a96b0fcb0c2ae14ad6093236250ab.
In order for posture.coils code to work, the frame codes need to be properly calculated. For posture.coils to work with @KezhiLi 's new algorithm, we'll need to either redo how posture.coils is calculated or we need to bring in the frame code calculation code, at least for codes 105 and 106. I'll look into the latter option.
Ok, I started off by grabbing a random directory, but this one didn't have AVIs:
https://www.dropbox.com/sh/5xytewwbbtokj1u/AABh-KUrgV-25RiCfpixppina?dl=0
Using the path described in openworm/biological_data#1, I have uploaded a directory that has AVIs here:
https://www.dropbox.com/sh/rtzneo87y118sv0/AACJ1UnO2A1CwlxcuBPkKCFSa?dl=0
I'm also going to upload READMEs that say what path these were representing from the NAS