deepethogram
deepethogram copied to clipboard
Ensuring correct flow computation and tweaking n_rgb
Dear deepethogram developers
I am testing the performance of my flow generators to ensure that they detect the subtle fly behaviors I am interested in. For this I am interactively running deepethogram.flow_generator.inference.extract_movie to get a better intuition of what the flow generators do.
In my tests, I see that microbehaviors like a leg twitch are not captured by the flow generator when using the default n_rgb=11. I was wondering if, because my videos have a high fps (150), only 11 frames are not able to capture the overall behavior i.e. within only 11 frames the animal always almost completely static. I am also worried that the flow computation seems to be there but it's almost invisible, which suggests it may not be working 100%
https://user-images.githubusercontent.com/13869571/233628578-78fb9dee-baf0-4932-88c8-161a380fd9e0.mp4
I have tried looking up in the code and also in https://arxiv.org/pdf/1704.00389.pdf what would be the impact on speed and accuracy of increasing the n_rgb in the flow_generator training, but could not find any intuitive explanation, beyond what I can guess from reading the paper and the linked preprint.
Is the flow computation shown in the video normal, or is there indeed a problem? If so, would increasing the n_rgb help? Or maybe there is a better approach?
Thanks Antonio
This is the same video after cropping the flow (left) part, normalizing it and bringing it back with the original rgb. You can appreciate the flow contains squared artifacts, probably either because
- reflecting the kernels
- the normalization because the original span of the flow was 235 (min) - 255 (max) i.e. ~20 out of 255
as a result, the leg movement is not clear
https://user-images.githubusercontent.com/13869571/233634050-35fce212-2343-478b-859a-720c1455c594.mp4
NOTE: This is how I normalized the flow so that its span becomes 0-255
import cv2
import numpy as np
cap=cv2.VideoCapture("twitch.mp4")
ret, frame=cap.read()
original = np.uint8(frame[:, frame.shape[1]//2:, :])
frame=frame[:, :frame.shape[1]//2,:]
absolute_min = min(max(0, frame.min()), absolute_min)
absolute_max = max(min(255, frame.max()), absolute_max)
frame=frame-frame.min()
frame=255*(frame/frame.max())
frame=np.uint8(frame)
frame=np.hstack([frame, original])
print(absolute_min, absolute_max)
# 235 255
This is the same experiment for a Proboscis Extension behavior
https://user-images.githubusercontent.com/13869571/233635463-80cd81b8-1c3d-46e2-8ec8-2528d4b4dcd4.mp4
https://user-images.githubusercontent.com/13869571/233635491-87567bfe-53c0-461b-be2a-ad59a1e81abb.mp4
Also please note, the artifacts in the flow are not a result of the .mp4 encoding, since they are present in the .png I saved with no compression, like this:
# https://stackoverflow.com/a/59921563/3541756
cv2.imwrite(f"{str(i).zfill(6)}.png", frame, [cv2.IMWRITE_PNG_COMPRESSION, 0])
Revisiting my question, I noticed by studying the code further, that deepethogram.flow_generator.inference.extract_movie contains an argument I glossed over, which puts a cap on how much flow magnitude (i.e. movement strength) can be represented by the colors i.e. what is the maximum movement that will be encoded with a different hue saturation, above which the maximum saturation will be used (regardless of the flow magnitude).
This argument is maxval. By default it's 5, but I noticed in my videos the maximum is around 0.5, which means my colors will only cover around 10% of the possible space, which means only max 10% saturation will be represented in my output video. This of course translates into overly white videos, as per how the HSV system works (low S means white).
So to sum up, there is no need to normalize the video like I was doing (also, it's a wrong way to normalize them). Instead, the maxval parameter should be tweaked to something around the max magnitude in your flow outputs
I have made this commit so that the HiddenTwoStream model uses num_images given by the config and not the hardcoded value of 11.
https://github.com/shaliulab/deepethogram/commit/9359aed9e581b07deddfa316b31788b974d9908e
I am training DEG by calling sequentially the two scripts in https://github.com/shaliulab/vsc-scripts/tree/91ea5e1acd1663e2b14f832f76941ad346717596/manual/deepethogram/train like so
python flow_generator_train.py --n-rgb 55 --epochs 10
python feature_extractor_train.py --n-rgb 55 --epochs 10 --flow-weights $PATH_TO_WEIGHTS_FROM_LAST_RUN --feat-weights pretrained
To be more explicit, this is how I signal to the DEG config that the models should have a different number of flows
cfg.flow_generator.n_rgb=args.n_rgb
cfg.feature_extractor.n_flows=args.n_rgb-1
A discussion on whether n_rgb needs to be matched to the framerate of the videos, and in that case, whether it makes more sense to increase the n_rgb or downsample the video's framerate would be appreciated.
Regardless of what n_rgb I use, the videos produced with extract_movie (and `maxval=0.5) consistently show this pattern where the fly has a homogeneous color throughout the recording, which is not what I expected, since there is no movement there, and also movements in opposite directions get the same color, as opposed to opposing colors.
https://user-images.githubusercontent.com/13869571/236439476-d31367c5-10ef-43da-ad62-74cdc54a1559.mp4