datasets icon indicating copy to clipboard operation
datasets copied to clipboard

Audio dataset is not decoding on 4.1.1

Open thewh1teagle opened this issue 2 months ago • 3 comments

Describe the bug

The audio column remain as non-decoded objects even when accessing them.

dataset = load_dataset("MrDragonFox/Elise", split = "train")
dataset[0] # see that it doesn't show 'array' etc...

Works fine with datasets==3.6.0

Followed the docs in

  • https://huggingface.co/docs/datasets/en/audio_load

Steps to reproduce the bug

dataset = load_dataset("MrDragonFox/Elise", split = "train")
dataset[0] # see that it doesn't show 'array' etc...

Expected behavior

It should decode when accessing the elemenet

Environment info

4.1.1 ubuntu 22.04

Related

  • https://github.com/huggingface/datasets/issues/7707

thewh1teagle avatar Oct 05 '25 06:10 thewh1teagle

Previously (datasets<=3.6.0), audio columns were decoded automatically when accessing a row. Now, for performance reasons, audio decoding is lazy by default: you just see the file path unless you explicitly cast the column to Audio.

Here’s the fix (following the current datasets audio docs ):

from datasets import load_dataset, Audio

dataset = load_dataset("MrDragonFox/Elise", split="train")

# Explicitly decode the audio column
dataset = dataset.cast_column("audio", Audio(sampling_rate=16_000))

print(dataset[0]["audio"])
# {'path': '...', 'array': array([...], dtype=float32), 'sampling_rate': 16000}

hBouanane avatar Oct 05 '25 10:10 hBouanane

@haitam03-yo's comment is right that the data is not decoded by default anymore indeed, but here is how it works in practice now:

From datasets v4, audio data are read as AudioDecoder objects from torchcodec. This doesn't decode the data by default, but you can call audio.get_all_samples() to decode the audio.

See the documentation on how to process audio data here: https://huggingface.co/docs/datasets/audio_process

lhoestq avatar Oct 06 '25 10:10 lhoestq

To resolve this, you need to explicitly cast the audio column to the Audio feature. This will decode the audio data and make it accessible as an array. Here is the corrected code snippet

from datasets import load_dataset, Audio

Load your dataset

dataset = load_dataset("MrDragonFox/Elise", split="train")

Explicitly cast the 'audio' column to the Audio feature

dataset = dataset.cast_column("audio", Audio(sampling_rate=16_000))

Now you can access the decoded audio array

print(dataset[0]["audio"])

By adding the cast_column step, you are telling the datasets library to decode the audio data with the specified sampling rate, and you will then be able to access the audio array as you were used to in previous versions.

GULSHANKUMAR6079 avatar Oct 06 '25 14:10 GULSHANKUMAR6079