pose-format icon indicating copy to clipboard operation
pose-format copied to clipboard

Bug in .webm pose estimation/visualization? 1000 fps

Open cleong110 opened this issue 1 year ago • 12 comments

I'd like to add .webm support to the bin utilities. The Sem-Lex dataset, for example, offers files in .webm format.

If #126 is accepted, It would be simple to add in the ".webm" extension to videos_to_poses, and recursively find/run pose estimation on .webm files.

However there appears to be a bug somewhere in the pipeline. I pulled a small sample from the sem-lex test set and ran videos_to_poses, and I had a number of them which registered as having 1000 frames per second, and when visualized, visualize in a sort of a weird/choppy way.

01Mv9aUuOQfLOlBYKffO.pose from sem-lex 01Mv9aUuOQfLOlBYKffO.mp4 for example, visualizes thus: 01Mv9aUuOQfLOlBYKffO

(see https://huggingface.co/spaces/cdleong/explore-pose-components)

and pose_info shows:

NumPyPoseBody
FPS: 1000.0
Data: <class 'numpy.ma.core.MaskedArray'> (42, 1, 576, 3), float32
Confidence shape: <class 'numpy.ndarray'> (42, 1, 576), float32
Duration (seconds): 0.042

cleong110 avatar Nov 18 '24 17:11 cleong110

Output log:


(pose-format-src) cleong@act3admin-Precision-7730:~/data/Sem-Lex/a_few_samples$ find . -name "*.pose"|parallel -j1 "echo '###############'; echo;pose_info -i '{}'|tail -n5;echo {};echo '**************'"
###############

FPS: 1000.0
Data: <class 'numpy.ma.core.MaskedArray'> (48, 1, 576, 3), float32
Confidence shape: <class 'numpy.ndarray'> (48, 1, 576), float32
Duration (seconds): 0.048

./01PP5tpqbxfOFTfCOAXs.pose
**************
###############

FPS: 1000.0
Data: <class 'numpy.ma.core.MaskedArray'> (67, 1, 576, 3), float32
Confidence shape: <class 'numpy.ndarray'> (67, 1, 576), float32
Duration (seconds): 0.067

./03bp8UWm3yyZo3gldgAJ.pose
**************
###############

FPS: 30.517711639404297
Data: <class 'numpy.ma.core.MaskedArray'> (56, 1, 576, 3), float32
Confidence shape: <class 'numpy.ndarray'> (56, 1, 576), float32
Duration (seconds): 1.834999971875123

./035rS0li9kijNugTYM3S.pose
**************
###############

FPS: 29.970029830932617
Data: <class 'numpy.ma.core.MaskedArray'> (59, 1, 576, 3), float32
Confidence shape: <class 'numpy.ndarray'> (59, 1, 576), float32
Duration (seconds): 1.9686333424701838

./019hCMsS4UU73vv6iNKu.pose
**************
###############

FPS: 30.508474349975586
Data: <class 'numpy.ma.core.MaskedArray'> (81, 1, 576, 3), float32
Confidence shape: <class 'numpy.ndarray'> (81, 1, 576), float32
Duration (seconds): 2.6550000196933747

./02EzNyg4nr60lATmx204.pose
**************
###############

FPS: 30.0
Data: <class 'numpy.ma.core.MaskedArray'> (61, 1, 576, 3), float32
Confidence shape: <class 'numpy.ndarray'> (61, 1, 576), float32
Duration (seconds): 2.033333333333333

./002kigTenJAA7b01kY24.pose
**************
###############

FPS: 30.021141052246094
Data: <class 'numpy.ma.core.MaskedArray'> (71, 1, 576, 3), float32
Confidence shape: <class 'numpy.ndarray'> (71, 1, 576), float32
Duration (seconds): 2.3650000470148016

./02bIch8058azoCPQjsP3.pose
**************
###############

FPS: 1000.0
Data: <class 'numpy.ma.core.MaskedArray'> (42, 1, 576, 3), float32
Confidence shape: <class 'numpy.ndarray'> (42, 1, 576), float32
Duration (seconds): 0.042

./01Mv9aUuOQfLOlBYKffO.pose
**************
###############

FPS: 1000.0
Data: <class 'numpy.ma.core.MaskedArray'> (88, 1, 576, 3), float32
Confidence shape: <class 'numpy.ndarray'> (88, 1, 576), float32
Duration (seconds): 0.088

./02SB4SPnq6rO64tbXgIL.pose
**************
###############

FPS: 31.208053588867188
Data: <class 'numpy.ma.core.MaskedArray'> (31, 1, 576, 3), float32
Confidence shape: <class 'numpy.ndarray'> (31, 1, 576), float32
Duration (seconds): 0.993333336592917

./02yRSX58pYUwIn1deC3w.pose
**************

cleong110 avatar Nov 18 '24 17:11 cleong110

video IDs which give me "1000 fps" issues:

01PP5tpqbxfOFTfCOAXs 03bp8UWm3yyZo3gldgAJ 01Mv9aUuOQfLOlBYKffO 02SB4SPnq6rO64tbXgIL

cleong110 avatar Nov 18 '24 17:11 cleong110

checking with ffprobe, the ones with the issue all seem to be vp9 codec, and r_frame_rate=1000/1. The ones with no issue are h264 codec, and have all kinds of frame rates like r_frame_rate=600/1 and r_frame_rate=30/1 and such.

cleong110 avatar Nov 18 '24 17:11 cleong110

Possibly related, they mention a 1k fps issue here: https://stackoverflow.com/questions/18123376/webm-to-mp4-conversion-using-ffmpeg

cleong110 avatar Nov 18 '24 17:11 cleong110

Apparently for this to work, you need a specific optional feature/codec... thing of ffmpeg? https://trac.ffmpeg.org/wiki/Encode/VP9 https://trac.ffmpeg.org/wiki/CompilationGuide/Ubuntu

Edit: no, installing that apt package didn't fix it. It seems it's an arcane intricacy of vp9 codec:

  • https://stackoverflow.com/questions/46571544/incorrect-fps-when-muxing-vp9-encoded-data-into-webm
  • https://video.stackexchange.com/questions/14718/how-do-i-alter-just-the-framerate-of-a-vp9-webm-without-affecting-anything-else

cleong110 avatar Nov 18 '24 17:11 cleong110

Well, I asked ChatGPT, and it seems that the answer, for files like this, is to simply just ignore the metadata and just ask it for the next frame until they're all done.

https://chatgpt.com/share/673b82b7-0680-800e-b6b8-280b03cfa310

while True:
        ret, frame = cap.read()

Testing this on the "1000 fps" files, I get

Total frames processed for 1000fps/03bp8UWm3yyZo3gldgAJ.webm: 67
Total frames processed for 1000fps/01PP5tpqbxfOFTfCOAXs.webm: 48
Total frames processed for 1000fps/01Mv9aUuOQfLOlBYKffO.webm: 42
Total frames processed for 1000fps/02SB4SPnq6rO64tbXgIL.webm: 88

which seem like reasonable numbers

...but then this is actually what the pose estimation script does.

Where this may become a problem is in PoseVisualizer, which uses the self.pose.body.fps in save_video for example

cleong110 avatar Nov 18 '24 18:11 cleong110

I think this may be the source of some downstream issues I ran into with signCLIP as well. Hmm.

Went back and checked, and yes, all the ones with a "negative dimension" error have unusual frame rates:

find ./files_with_negative_dimension_error_when_embedding/ -name "*.mp4"|parallel -j1 "echo '*************';ffprobe -v error -show_streams {}|grep frame;echo {}"
*************
has_b_frames=2
r_frame_rate=15061/500
avg_frame_rate=15061/500
nb_frames=258
nb_read_frames=N/A
r_frame_rate=0/0
avg_frame_rate=0/0
nb_frames=404
nb_read_frames=N/A
./files_with_negative_dimension_error_when_embedding/1628439615884465-SIGN_LANGUAGE.mp4
*************
has_b_frames=2
r_frame_rate=30/1
avg_frame_rate=337000/11233
nb_frames=337
nb_read_frames=N/A
r_frame_rate=0/0
avg_frame_rate=0/0
nb_frames=527
nb_read_frames=N/A
./files_with_negative_dimension_error_when_embedding/42667478960180394-SIGN_LANGUAGE.mp4
*************
has_b_frames=2
r_frame_rate=60000/1001
avg_frame_rate=60000/1001
nb_frames=303
nb_read_frames=N/A
r_frame_rate=0/0
avg_frame_rate=0/0
nb_frames=239
nb_read_frames=N/A
./files_with_negative_dimension_error_when_embedding/8374881306003543-OBSESS.mp4
*************
has_b_frames=2
r_frame_rate=30/1
avg_frame_rate=335000/11167
nb_frames=335
nb_read_frames=N/A
r_frame_rate=0/0
avg_frame_rate=0/0
nb_frames=523
nb_read_frames=N/A
./files_with_negative_dimension_error_when_embedding/34339690863866057-SHORT_PERSON.mp4
*************
has_b_frames=2
r_frame_rate=30/1
avg_frame_rate=262000/8733
nb_frames=262
nb_read_frames=N/A
r_frame_rate=0/0
avg_frame_rate=0/0
nb_frames=410
nb_read_frames=N/A
./files_with_negative_dimension_error_when_embedding/375739576064793-ELECTRICIAN.mp4
*************
has_b_frames=2
r_frame_rate=60000/1001
avg_frame_rate=60000/1001
nb_frames=498
nb_read_frames=N/A
r_frame_rate=0/0
avg_frame_rate=0/0
nb_frames=390
nb_read_frames=N/A
./files_with_negative_dimension_error_when_embedding/28740895579357995-IRON_2.mp4
*************
has_b_frames=0
r_frame_rate=30/1
avg_frame_rate=30/1
nb_frames=257
nb_read_frames=N/A
./files_with_negative_dimension_error_when_embedding/20233165380117013-FENCE_2.mp4
*************
has_b_frames=0
r_frame_rate=30/1
avg_frame_rate=30/1
nb_frames=269
nb_read_frames=N/A
./files_with_negative_dimension_error_when_embedding/42295410903451014-DEAF.mp4
*************
has_b_frames=2
r_frame_rate=30/1
avg_frame_rate=262000/8733
nb_frames=262
nb_read_frames=N/A
r_frame_rate=0/0
avg_frame_rate=0/0
nb_frames=410
nb_read_frames=N/A
./files_with_negative_dimension_error_when_embedding/5648363198357482-ARM_2.mp4
*************
has_b_frames=2
r_frame_rate=30153/1000
avg_frame_rate=30153/1000
nb_frames=257
nb_read_frames=N/A
r_frame_rate=0/0
avg_frame_rate=0/0
nb_frames=402
nb_read_frames=N/A
./files_with_negative_dimension_error_when_embedding/39030907344681975-JEWELRY.mp4

In particular, all of these seem to have two "streams", one of which has a frame rate of 0

Edit: ah, those are the audio streams. One would think cv2 would have no issue with this?

cleong110 avatar Nov 18 '24 18:11 cleong110

could you share a video file that fails (has 1000 fps)?

AmitMY avatar Nov 19 '24 10:11 AmitMY

Note: if I do the .webm to .mp4 conversion described in https://stackoverflow.com/questions/18123376/webm-to-mp4-conversion-using-ffmpeg, manually setting the fps to 24, the resulting .pose file ends up with 24fps, and the .gif looks much more reasonable. 01Mv9aUuOQfLOlBYKffO_24fps

This is from 01Mv9aUuOQfLOlBYKffO.webm from the Sem-Lex dataset.

ffmpeg command used:

ffmpeg -fflags +genpts -i 01Mv9aUuOQfLOlBYKffO.webm -r 24 01Mv9aUuOQfLOlBYKffO.mp4

cleong110 avatar Nov 19 '24 16:11 cleong110

could you share a video file that fails (has 1000 fps)?

DM'ed you some samples from the Sem-Lex dataset

cleong110 avatar Nov 19 '24 16:11 cleong110

Here's an example file I was sent

01Mv9aUuOQfLOlBYKffO.webm

The file is badly created. It has no duration, no fps, and the frames are not equally distributed.

Running

ffprobe -v error -select_streams v:0 -show_frames -show_entries frame=pts_time -of csv=p=0 01Mv9aUuOQfLOlBYKffO.webm

we get the frame times. If we plot them, alongside a regression line, we see that they don't sit on the line image

The pose-format library does not support arbitrary frame times.

Still, if we want to get the fps for these kind of videos, we can ask:

  1. if the video has no duration, and the fps is 1000 (by cv2), then
  2. run an estimation of the fps, such as:
ffprobe -v error -select_streams v:0 -show_frames -show_entries frame=pts_time -of csv=p=0 01Mv9aUuOQfLOlBYKffO.webm | awk 'NR > 1 { diff = $1 - prev; sum += diff; count++ } { prev = $1 } END { print "Assumed FPS:", 1 / (sum / count) }' 

This results in: Assumed FPS: 30.1471

@cleong110 would you be interested in supporting this kind of case?

AmitMY avatar Dec 29 '24 16:12 AmitMY

That's pretty cool detective work! I thought that video looked weird. I think a large proportion of the videos in the Sem-Lex dataset have issues along these lines - when I ran the signclip embedding tool I was getting many, many errors that ultimately stemmed from the frame count being inflated.

As for whether to support automatic estimation of fps, I think that from an end-user perspective, one of the following would be nice:

  1. It "just works" on various videos, including ones with issues. That would be ideal, but I don't know if it's realistic.
  2. It at least detects and warns about issues like this, and lets the user handle it.

Honestly I sort of lean towards 2. If the source file has issues, warning about them rather than silently passing the issue downstream seems best.

cleong110 avatar Jan 06 '25 15:01 cleong110