Bug in .webm pose estimation/visualization? 1000 fps
I'd like to add .webm support to the bin utilities. The Sem-Lex dataset, for example, offers files in .webm format.
If #126 is accepted, It would be simple to add in the ".webm" extension to videos_to_poses, and recursively find/run pose estimation on .webm files.
However there appears to be a bug somewhere in the pipeline. I pulled a small sample from the sem-lex test set and ran videos_to_poses, and I had a number of them which registered as having 1000 frames per second, and when visualized, visualize in a sort of a weird/choppy way.
01Mv9aUuOQfLOlBYKffO.pose from sem-lex 01Mv9aUuOQfLOlBYKffO.mp4 for example, visualizes thus:
(see https://huggingface.co/spaces/cdleong/explore-pose-components)
and pose_info shows:
NumPyPoseBody
FPS: 1000.0
Data: <class 'numpy.ma.core.MaskedArray'> (42, 1, 576, 3), float32
Confidence shape: <class 'numpy.ndarray'> (42, 1, 576), float32
Duration (seconds): 0.042
Output log:
(pose-format-src) cleong@act3admin-Precision-7730:~/data/Sem-Lex/a_few_samples$ find . -name "*.pose"|parallel -j1 "echo '###############'; echo;pose_info -i '{}'|tail -n5;echo {};echo '**************'"
###############
FPS: 1000.0
Data: <class 'numpy.ma.core.MaskedArray'> (48, 1, 576, 3), float32
Confidence shape: <class 'numpy.ndarray'> (48, 1, 576), float32
Duration (seconds): 0.048
./01PP5tpqbxfOFTfCOAXs.pose
**************
###############
FPS: 1000.0
Data: <class 'numpy.ma.core.MaskedArray'> (67, 1, 576, 3), float32
Confidence shape: <class 'numpy.ndarray'> (67, 1, 576), float32
Duration (seconds): 0.067
./03bp8UWm3yyZo3gldgAJ.pose
**************
###############
FPS: 30.517711639404297
Data: <class 'numpy.ma.core.MaskedArray'> (56, 1, 576, 3), float32
Confidence shape: <class 'numpy.ndarray'> (56, 1, 576), float32
Duration (seconds): 1.834999971875123
./035rS0li9kijNugTYM3S.pose
**************
###############
FPS: 29.970029830932617
Data: <class 'numpy.ma.core.MaskedArray'> (59, 1, 576, 3), float32
Confidence shape: <class 'numpy.ndarray'> (59, 1, 576), float32
Duration (seconds): 1.9686333424701838
./019hCMsS4UU73vv6iNKu.pose
**************
###############
FPS: 30.508474349975586
Data: <class 'numpy.ma.core.MaskedArray'> (81, 1, 576, 3), float32
Confidence shape: <class 'numpy.ndarray'> (81, 1, 576), float32
Duration (seconds): 2.6550000196933747
./02EzNyg4nr60lATmx204.pose
**************
###############
FPS: 30.0
Data: <class 'numpy.ma.core.MaskedArray'> (61, 1, 576, 3), float32
Confidence shape: <class 'numpy.ndarray'> (61, 1, 576), float32
Duration (seconds): 2.033333333333333
./002kigTenJAA7b01kY24.pose
**************
###############
FPS: 30.021141052246094
Data: <class 'numpy.ma.core.MaskedArray'> (71, 1, 576, 3), float32
Confidence shape: <class 'numpy.ndarray'> (71, 1, 576), float32
Duration (seconds): 2.3650000470148016
./02bIch8058azoCPQjsP3.pose
**************
###############
FPS: 1000.0
Data: <class 'numpy.ma.core.MaskedArray'> (42, 1, 576, 3), float32
Confidence shape: <class 'numpy.ndarray'> (42, 1, 576), float32
Duration (seconds): 0.042
./01Mv9aUuOQfLOlBYKffO.pose
**************
###############
FPS: 1000.0
Data: <class 'numpy.ma.core.MaskedArray'> (88, 1, 576, 3), float32
Confidence shape: <class 'numpy.ndarray'> (88, 1, 576), float32
Duration (seconds): 0.088
./02SB4SPnq6rO64tbXgIL.pose
**************
###############
FPS: 31.208053588867188
Data: <class 'numpy.ma.core.MaskedArray'> (31, 1, 576, 3), float32
Confidence shape: <class 'numpy.ndarray'> (31, 1, 576), float32
Duration (seconds): 0.993333336592917
./02yRSX58pYUwIn1deC3w.pose
**************
video IDs which give me "1000 fps" issues:
01PP5tpqbxfOFTfCOAXs 03bp8UWm3yyZo3gldgAJ 01Mv9aUuOQfLOlBYKffO 02SB4SPnq6rO64tbXgIL
checking with ffprobe, the ones with the issue all seem to be vp9 codec, and r_frame_rate=1000/1. The ones with no issue are h264 codec, and have all kinds of frame rates like r_frame_rate=600/1 and r_frame_rate=30/1 and such.
Possibly related, they mention a 1k fps issue here: https://stackoverflow.com/questions/18123376/webm-to-mp4-conversion-using-ffmpeg
Apparently for this to work, you need a specific optional feature/codec... thing of ffmpeg? https://trac.ffmpeg.org/wiki/Encode/VP9 https://trac.ffmpeg.org/wiki/CompilationGuide/Ubuntu
Edit: no, installing that apt package didn't fix it. It seems it's an arcane intricacy of vp9 codec:
- https://stackoverflow.com/questions/46571544/incorrect-fps-when-muxing-vp9-encoded-data-into-webm
- https://video.stackexchange.com/questions/14718/how-do-i-alter-just-the-framerate-of-a-vp9-webm-without-affecting-anything-else
Well, I asked ChatGPT, and it seems that the answer, for files like this, is to simply just ignore the metadata and just ask it for the next frame until they're all done.
https://chatgpt.com/share/673b82b7-0680-800e-b6b8-280b03cfa310
while True:
ret, frame = cap.read()
Testing this on the "1000 fps" files, I get
Total frames processed for 1000fps/03bp8UWm3yyZo3gldgAJ.webm: 67
Total frames processed for 1000fps/01PP5tpqbxfOFTfCOAXs.webm: 48
Total frames processed for 1000fps/01Mv9aUuOQfLOlBYKffO.webm: 42
Total frames processed for 1000fps/02SB4SPnq6rO64tbXgIL.webm: 88
which seem like reasonable numbers
...but then this is actually what the pose estimation script does.
Where this may become a problem is in PoseVisualizer, which uses the self.pose.body.fps in save_video for example
I think this may be the source of some downstream issues I ran into with signCLIP as well. Hmm.
Went back and checked, and yes, all the ones with a "negative dimension" error have unusual frame rates:
find ./files_with_negative_dimension_error_when_embedding/ -name "*.mp4"|parallel -j1 "echo '*************';ffprobe -v error -show_streams {}|grep frame;echo {}"
*************
has_b_frames=2
r_frame_rate=15061/500
avg_frame_rate=15061/500
nb_frames=258
nb_read_frames=N/A
r_frame_rate=0/0
avg_frame_rate=0/0
nb_frames=404
nb_read_frames=N/A
./files_with_negative_dimension_error_when_embedding/1628439615884465-SIGN_LANGUAGE.mp4
*************
has_b_frames=2
r_frame_rate=30/1
avg_frame_rate=337000/11233
nb_frames=337
nb_read_frames=N/A
r_frame_rate=0/0
avg_frame_rate=0/0
nb_frames=527
nb_read_frames=N/A
./files_with_negative_dimension_error_when_embedding/42667478960180394-SIGN_LANGUAGE.mp4
*************
has_b_frames=2
r_frame_rate=60000/1001
avg_frame_rate=60000/1001
nb_frames=303
nb_read_frames=N/A
r_frame_rate=0/0
avg_frame_rate=0/0
nb_frames=239
nb_read_frames=N/A
./files_with_negative_dimension_error_when_embedding/8374881306003543-OBSESS.mp4
*************
has_b_frames=2
r_frame_rate=30/1
avg_frame_rate=335000/11167
nb_frames=335
nb_read_frames=N/A
r_frame_rate=0/0
avg_frame_rate=0/0
nb_frames=523
nb_read_frames=N/A
./files_with_negative_dimension_error_when_embedding/34339690863866057-SHORT_PERSON.mp4
*************
has_b_frames=2
r_frame_rate=30/1
avg_frame_rate=262000/8733
nb_frames=262
nb_read_frames=N/A
r_frame_rate=0/0
avg_frame_rate=0/0
nb_frames=410
nb_read_frames=N/A
./files_with_negative_dimension_error_when_embedding/375739576064793-ELECTRICIAN.mp4
*************
has_b_frames=2
r_frame_rate=60000/1001
avg_frame_rate=60000/1001
nb_frames=498
nb_read_frames=N/A
r_frame_rate=0/0
avg_frame_rate=0/0
nb_frames=390
nb_read_frames=N/A
./files_with_negative_dimension_error_when_embedding/28740895579357995-IRON_2.mp4
*************
has_b_frames=0
r_frame_rate=30/1
avg_frame_rate=30/1
nb_frames=257
nb_read_frames=N/A
./files_with_negative_dimension_error_when_embedding/20233165380117013-FENCE_2.mp4
*************
has_b_frames=0
r_frame_rate=30/1
avg_frame_rate=30/1
nb_frames=269
nb_read_frames=N/A
./files_with_negative_dimension_error_when_embedding/42295410903451014-DEAF.mp4
*************
has_b_frames=2
r_frame_rate=30/1
avg_frame_rate=262000/8733
nb_frames=262
nb_read_frames=N/A
r_frame_rate=0/0
avg_frame_rate=0/0
nb_frames=410
nb_read_frames=N/A
./files_with_negative_dimension_error_when_embedding/5648363198357482-ARM_2.mp4
*************
has_b_frames=2
r_frame_rate=30153/1000
avg_frame_rate=30153/1000
nb_frames=257
nb_read_frames=N/A
r_frame_rate=0/0
avg_frame_rate=0/0
nb_frames=402
nb_read_frames=N/A
./files_with_negative_dimension_error_when_embedding/39030907344681975-JEWELRY.mp4
In particular, all of these seem to have two "streams", one of which has a frame rate of 0
Edit: ah, those are the audio streams. One would think cv2 would have no issue with this?
could you share a video file that fails (has 1000 fps)?
Note: if I do the .webm to .mp4 conversion described in https://stackoverflow.com/questions/18123376/webm-to-mp4-conversion-using-ffmpeg, manually setting the fps to 24, the resulting .pose file ends up with 24fps, and the .gif looks much more reasonable.
This is from 01Mv9aUuOQfLOlBYKffO.webm from the Sem-Lex dataset.
ffmpeg command used:
ffmpeg -fflags +genpts -i 01Mv9aUuOQfLOlBYKffO.webm -r 24 01Mv9aUuOQfLOlBYKffO.mp4
could you share a video file that fails (has 1000 fps)?
DM'ed you some samples from the Sem-Lex dataset
Here's an example file I was sent
The file is badly created. It has no duration, no fps, and the frames are not equally distributed.
Running
ffprobe -v error -select_streams v:0 -show_frames -show_entries frame=pts_time -of csv=p=0 01Mv9aUuOQfLOlBYKffO.webm
we get the frame times.
If we plot them, alongside a regression line, we see that they don't sit on the line
The pose-format library does not support arbitrary frame times.
Still, if we want to get the fps for these kind of videos, we can ask:
- if the video has no duration, and the fps is 1000 (by cv2), then
- run an estimation of the fps, such as:
ffprobe -v error -select_streams v:0 -show_frames -show_entries frame=pts_time -of csv=p=0 01Mv9aUuOQfLOlBYKffO.webm | awk 'NR > 1 { diff = $1 - prev; sum += diff; count++ } { prev = $1 } END { print "Assumed FPS:", 1 / (sum / count) }'
This results in: Assumed FPS: 30.1471
@cleong110 would you be interested in supporting this kind of case?
That's pretty cool detective work! I thought that video looked weird. I think a large proportion of the videos in the Sem-Lex dataset have issues along these lines - when I ran the signclip embedding tool I was getting many, many errors that ultimately stemmed from the frame count being inflated.
As for whether to support automatic estimation of fps, I think that from an end-user perspective, one of the following would be nice:
- It "just works" on various videos, including ones with issues. That would be ideal, but I don't know if it's realistic.
- It at least detects and warns about issues like this, and lets the user handle it.
Honestly I sort of lean towards 2. If the source file has issues, warning about them rather than silently passing the issue downstream seems best.