Torchcodec decoding
Closes #7607
New signatures
Audio
Audio(sampling_rate: Optional[int] = None, mono: bool = True, decode: bool = True, stream_index: Optional[int] = None)
Audio.encode_example(self, value: Union[str, bytes, bytearray, dict, "AudioDecoder"]) -> dict
Audio.decode_example(self, value: dict, token_per_repo_id: Optional[dict[str, Union[str, bool, None]]] = None) -> "AudioDecoder":
Video
Video(decode: bool = True, stream_index: Optional[int] = None, dimension_order: Literal['NCHW', 'NHWC'] = 'NCHW', num_ffmpeg_threads: int = 1, device: Optional[Union[str, "torch.device"]] = 'cpu', seek_mode: Literal['exact', 'approximate'] = 'exact')
Video.encode_example(self, value: Union[str, bytes, bytearray, Example, np.ndarray, "VideoDecoder"]) -> Example:
Video.decode_example(self, value: Union[str, Example], token_per_repo_id: Optional[dict[str, Union[bool, str]]] = None, ) -> "VideoDecoder":
Notes
Audio features constructor takes in 1 new optional param stream_index which is passed to the AudioDecoder constructor to select the stream index of a file. Audio feature can now take in torchcodec.decoders.AudioDecoder as input to encode_example() Audio feature decode_example() returns torchcodec.decoders.AudioDecoder
Video feature constructor takes in 5 new optional params stream_index, dimension_order, num_ffmpeg_threads, device, seek_mode all of which are passed to VideoDecoder constructor Video feature decode_example() returns torchcodec.decoders.VideoDecoder Video feature can now take in torchcodec.decoders.VideoDecoder as input to encode_example()
All test cases have been updated to reflect these changes All documentation has also been updated to reflect these changes.
Both VideoDecoder and AudioDecoder when formatted with (np_formatter, tf_formatter, etc) will ignore the type and return themselves. Formatting test cases were updated accordingly to reflect this. (Pretty simple to make this not the case if we want though)
Errors
This test case from tests/packaged_modules/test_audiofolder.py
@require_librosa
@require_sndfile
@pytest.mark.parametrize("streaming", [False, True])
def test_data_files_with_metadata_and_archives(streaming, cache_dir, data_files_with_zip_archives):
audiofolder = AudioFolder(data_files=data_files_with_zip_archives, cache_dir=cache_dir)
audiofolder.download_and_prepare()
datasets = audiofolder.as_streaming_dataset() if streaming else audiofolder.as_dataset()
for split, data_files in data_files_with_zip_archives.items():
num_of_archives = len(data_files) # the metadata file is inside the archive
expected_num_of_audios = 2 * num_of_archives
assert split in datasets
dataset = list(datasets[split])
assert len(dataset) == expected_num_of_audios
# make sure each sample has its own audio (all arrays are different) and metadata
assert (
sum(np.array_equal(dataset[0]["audio"].get_all_samples().data.numpy(), example["audio"].get_all_samples().data.numpy()) for example in dataset[1:])
== 0
)
assert len({example["text"] for example in dataset}) == expected_num_of_audios
assert all(example["text"] is not None for example in dataset)
Fails now because AudioDecoder needs to access the files after the lines below are run, but there seems to be some context issues. The file the decoder is trying to read is closed before the decoder gets the chance to decode it.
audiofolder.download_and_prepare()
datasets = audiofolder.as_streaming_dataset() if streaming else audiofolder.as_dataset()