ast icon indicating copy to clipboard operation
ast copied to clipboard

AssertionError: choose a window size 400 that is [2, 1]

Open GrafKnusprig opened this issue 2 weeks ago • 2 comments

I try to use the feature extractor on my audiofiles. My audio files are all 16000Hz and 5 seconds long. The waveform.shape[1] is 80000

input_values = feature_extractor(waveform, sampling_rate=16000, return_tensors="pt").input_values

I get the error: AssertionError: choose a window size 400 that is [2, 1] and I don't really know what to do with it.

Here is the whole thing:

def preprocess_function(examples):
    audio_files = examples['file_path']
    inputs = {'input_values': []}
    for audio_file in tqdm(audio_files, desc="Preprocessing dataset"):
        waveform, sample_rate = torchaudio.load(audio_file)
        # Ensure sample rate is 16000 Hz
        assert sample_rate == 16000, f"Expected sample rate of 16000 Hz, but got {sample_rate} Hz"
        # Assuming all audio files are 5 seconds long
        max_len = 16000 * 5  # 5 seconds at 16000 Hz
        # Pad or truncate to the maximum length
        print(waveform.shape[1])
        if waveform.shape[1] > max_len:
            waveform = waveform[:, :max_len]
        else:
            waveform = torch.nn.functional.pad(waveform, (0, max_len - waveform.shape[1]), "constant", 0)
        input_values = feature_extractor(waveform, sampling_rate=16000, return_tensors="pt").input_values
        inputs['input_values'].append(input_values.squeeze(0))
    return inputs


processed_dataset = dataset_dict.map(preprocess_function, batched=True, remove_columns=['file_path'])```

GrafKnusprig avatar Jun 19 '24 14:06 GrafKnusprig