decord
decord copied to clipboard
Issues on videos with imprecise metadata
I noticed a failing unit test working on a PR. It was asserting a wrong expected result (incorrectly returned by older versions of decord), that it was fixed (by accident?) by a later version.
def test_video_reader_len():
vr = _get_default_test_video() # "flipping_a_pancake.mkv"
assert len(vr) == 311 #number of frames is actually 310, and current version gives the correct result
-
are these tests automatically run anywhere, and should they? :-)
-
the failing test is a symptom of something bad going on, so I took a deeper look.. The behavior of
GetFrameCountwas changed (and "fixed", as a side effect) whenGetFramePTSwas introduced in 0.3.6. With decord 0.3.5, on ""flipping_a_pancake.mkv", the function would fall back to an approximation based on FPS*duration (from metadata). In that video, the duration metadata seems to be imprecise (I think it's actually the duration of the audio track rather than the video stream), so the estimation is off by one, and this creates issues everywhere. For example: , a VideoReader of that video returns 311 frames (while the video actually only has 310), where the last "extra" frame actually contains another "random" frame of the video. From 0.3.6, it seems thatGetFrameCountwill (always?) returnframe_ts_.size(), which is more precise and fixes the issue. Should perhaps the approximation of GetFrameCount be completely removed byGetFrameCount, making sureframe_ts_.size()is always used, to avoid similar issues? -
That is not the only issue that occurred on that video. With bad or missing duration metadata, a lot of other things are still going wrong, mainly because that metadata is used in
Seek. With the latest version of decord, on "flipping_a_pancake.mkv", the video stream has bad duration in the metadata (same also for the other example video, both from Kinetics I guess)... leading to very bad things with anything that needs to Seek or SeekAccurate (anything that is not a sequential read of the entire video..). For example, this is an interesting failure (still occurring with the latest version):
from decord import VideoReader
vr = VideoReader("flipping_a_pancake.mkv")
frames = vr[:].asnumpy() # no Seek is involved, except Seek(0); so, correct result.
b = vr[152].asnumpy() # Seek(150) "fails", actually seeking to ts = 0
(b == frames[152]).all() # returns False
(b == frames[2]).all() # returns True
where the nearest keyframe is 150, but Seek(150) goes back to 0 due to a bad duration in the metadata (negative..). Is there a way to be robust to this kind of issue, maybe relying on the pts in frame_ts, rather then crucially relying on the "duration" metadata in FrameToPTS and other functions? But videos are messy, and the pts could have also their own issues, so I don't know if that would solve everything.
- I haven't got time to setup the CI/CD so it's not forced to check the unittests
- The GetFrameCount now relies on frame_ts_ strictly after the initial check from metadata.
- I didn't notice the issue actually. Good catch! I don't have a solid solution to handle various buggy metadata in all kinds of weird videos, and it will become very unfortunate complex problem if we couldn't rely on some key info about the video itself. Without inserting tons of error-check in the current pipeline what I can think of now is to add a validation pass called when metadata is loaded, we can try to correct the metadata or maybe raise an error if we couldn't handle it.
+1 on raising an error/warning when corrupted metadata presents. Then people can maybe pick those videos out and fix them or use something like FFMS2 to retrieve the correct frame offline.
Another weird issue where an incorrect frame is decoded by VideoReader with __getitem__ at a certain index.
Can't share the video -- but it's again a video with quite messy metadata (messy pts, sometimes duplicated, non-constant framerate, pts not sorted as dts). Not "obviously" wrong metadata though, and indeed it usually works fine, except in some weird cases where it fails.
Was seeing something weird, and doing some tests I was able to replicate the issue:
- opencv and decord obtain exactly the same frames, using get_batch
- extracting frames directly with ffmpeg also gives exactly the same results
- extracting with
__getitem__is fine, if done in consecutive order (so no "seek" is actually needed) - repeatedly calling
__getitem__, to have an "actual" random access, often (not always..) gives an incorrect result at a certain unlucky index (145).
from decord import VideoReader
import cv2
import numpy as np
import glob
vr = VideoReader("weird_video.mov")
NUM_FRAMES = 150
print("Extract batch with decord")
frames = vr[:NUM_FRAMES].asnumpy()
print("Check extraction with opencv is the same...", end=' ')
cap = cv2.VideoCapture("weird_video.mov")
for i in range(NUM_FRAMES):
_, img = cap.read()
#needed because video has rotation metadata..
img = np.flip(cv2.cvtColor(img, cv2.COLOR_BGR2RGB).transpose((1, 0, 2)), axis=0)
if not (frames[i] == img).all():
print(f"{i} different!")
break
cap.release()
if i == NUM_FRAMES-1: print("OK!")
print("Check extraction with ffmpeg is the same...", end=' ')
frames_ffmpeg = []
imgs = sorted(glob.glob("weird_frames/*"))
for i in range(NUM_FRAMES):
img = cv2.cvtColor(cv2.imread(imgs[i]), cv2.COLOR_BGR2RGB)
if not (frames[i] == img).all():
print(f"{i} different!")
break
if i == NUM_FRAMES-1: print("OK!")
print("Check random access in progressive order is correct (no Seek needed!)...", end=' ')
for i in range(NUM_FRAMES):
if not (frames[i] == vr[i].asnumpy()).all():
print(f"Random access for {i} not correct!")
break
if i == NUM_FRAMES-1: print("OK!")
print("Check random access in critical frame (145) is correct (Seek is needed!)")
for it, i in enumerate([145]*100):
extracted = vr[i].asnumpy()
if not (frames[i] == extracted).all():
print(f"{it}: Not true!", end=' ')
for k in range(5):
if (frames[i+k] == extracted).all():
print("Actually equal to", i+k)
break
else:
print(f"{it}: Correct! Equal to", i)
output:
Extract batch with decord
Check extraction with opencv is the same... OK!
Check extraction with ffmpeg is the same... OK!
Check random access in progressive order is correct (no Seek needed!)... OK!
Check random access in critical frame (145) is correct (Seek is needed!)...
0: Correct! Equal to 145
1: Not true! Actually equal to 146
This is not deterministic, it doesn't happen all the time.
Putting some output messages in the code, it looks like in SkipFrames the video_reader is popping before the correct frame was actually decoded. Adding a blunt sleep in video_reader.cc before popping from the decoder (obviously not a possible fix!) seems to "solve" the issue.
@@ -456,6 +456,7 @@ void VideoReader::SkipFrames(int64_t num) {
curr_frame_ += num;
while (num > 0) {
PushNext();
+ std::this_thread::sleep_for(std::chrono::milliseconds(100));
ret = decoder_->Pop(&frame);
if (!ret) continue;
// LOG(INFO) << "skip: " << num;
@sbebo Could you please test your videos on this pr? https://github.com/dmlc/decord/pull/78
It looks like it fixes both (while the new one perhaps it's slightly slower, but I haven't really benchmarked deeply).
On "flipping_a_pancake.mkv", on the branch of #78, no more issues in seeking:
from decord import VideoReader
vr = VideoReader("flipping_a_pancake.mkv")
frames = vr[:].asnumpy() # no Seek is involved, except Seek(0); so, correct result (hopefully)
b = vr[152].asnumpy()
(b == frames[152]).all() # returns True 👍
(while with decord 4.0 the variable b would actually still point to frame number 2!)
On the other crappy video I have (running the same code I used above):
With decord 4.0, random access often fails:
Extract batch with decord
Check extraction with opencv is the same... OK!
Check extraction with ffmpeg is the same... OK!
Check random access in progressive order is correct (no Seek) OK!
Check random access in critical frame (145) is correct (Seeking is needed)
100it [00:03, 29.52it/s]
Correct: 84/100 👎
On the branch of #78, random access no longer fails, it seems.
Extract batch with decord
Check extraction with opencv is the same... OK!
Check extraction with ffmpeg is the same... OK!
Check random access in progressive order is correct (no Seek) OK!
heck random access in critical frame (145) is correct (Seeking is needed)
100it [00:03, 28.20it/s]
Correct: 100/100 👍
(Also checked with more iterations, I have never seen it fail to extract the correct frame).
yeah! that's good news. could you please also test speed changes? ideally please set the num threads to 1.