video2dataset icon indicating copy to clipboard operation
video2dataset copied to clipboard

FrameSubsampler broken in version 1.3.0

Open libeanim opened this issue 1 year ago • 10 comments

When adding the FrameSubsampler to the config

subsampling:
    FrameSubsampler:
        args:
            frame_rate: 5
            downsample_method: 'fps'
            encode_format: 'mp4'

I get following error message:

Traceback (most recent call last):
  File "/home/evobits/miniconda3/envs/laion/lib/python3.9/site-packages/video2dataset/workers/download_worker.py", line 102, in __call__
    self.download_shard(row)
  File "/home/evobits/miniconda3/envs/laion/lib/python3.9/site-packages/video2dataset/workers/download_worker.py", line 161, in download_shard
    writer_encode_formats["video"] = self.subsamplers["video"][0].encode_formats["video"]
AttributeError: 'FrameSubsampler' object has no attribute 'encode_formats'
shard ./TEST/results2/_tmp/9.feather failed with error 'FrameSubsampler' object has no attribute 'encode_formats'

Is this a typo in the download_worker.py or is there an issue with my config?

libeanim avatar Feb 09 '24 17:02 libeanim

Seems to be related to #263

libeanim avatar Feb 09 '24 17:02 libeanim

Seems to be a discrepancy between encode_format and encode_formats Need to choose one and use it everywhere

rom1504 avatar Feb 09 '24 17:02 rom1504

https://github.com/iejMac/video2dataset/blob/3e101a126dda134d2a9b3ee82fa599c3125b5da0/video2dataset/workers/download_worker.py#L89 looks like there's no s in the frame subsampler

rom1504 avatar Feb 09 '24 17:02 rom1504

https://github.com/iejMac/video2dataset/blob/3e101a126dda134d2a9b3ee82fa599c3125b5da0/video2dataset/workers/download_worker.py#L161 so yeah that line is wrong, and this seems untested

rom1504 avatar Feb 09 '24 17:02 rom1504

Probably the easiest fix is to migrate all to encode_formats

rom1504 avatar Feb 09 '24 17:02 rom1504

https://github.com/iejMac/video2dataset/pull/287/files this was broken in this PR

rom1504 avatar Feb 09 '24 17:02 rom1504

The main problem here is the absence of test for this subsampler usage

rom1504 avatar Feb 09 '24 17:02 rom1504

https://github.com/iejMac/video2dataset/pull/271/files that fix seems to be going in the wrong direction

rom1504 avatar Feb 09 '24 17:02 rom1504

Just for testing I have added

self.encode_formats = {'video': encode_format}

to the FrameSubsampler.__init__ method. The original problem seems to be solved but now I am getting this error:

Downloading starting now, check your bandwidth speed (with bwm-ng)your cpu (with htop), and your disk usage (with iotop)!          
Traceback (most recent call last):
  File "/home/evobits/miniconda3/envs/laion/lib/python3.9/site-packages/video2dataset/workers/download_worker.py", line 237, in download_shard                                                                                                                        
    for modality in subsampled_streams:                                                                                            
RuntimeError: dictionary keys changed during iteration                                                                             
Sample 0 failed to download: dictionary keys changed during iteration

Not entirely sure how to proceed.

libeanim avatar Feb 15 '24 14:02 libeanim

what about this fix? https://github.com/marianna13/video2dataset/blob/6e9d704b687cf3a2311f565b2ca387eeed73337d/video2dataset/subsamplers/frame_subsampler.py#L39

I use the following config:

subsampling: 
    FrameSubsampler:
        args:
            frame_rate: 5
            downsample_method: 'fps'
            encode_formats: 
                video: 'mp4'

marianna13 avatar Feb 16 '24 12:02 marianna13