fiftyone icon indicating copy to clipboard operation
fiftyone copied to clipboard

[FR] Add support for audio samples

Open lminer opened this issue 4 years ago • 6 comments

Are there any plans to add audio support to this project?

lminer avatar Aug 12 '21 16:08 lminer

Hey @lminer 👋

Can you share a bit more about what you’d like to do with audio? Your use case, what you do today, etc.

You can currently (as of fiftyone==0.12.0) listen to audio when playing back video datasets in the App, and we’re also planning to add spectrograms.

In general, audio is something that comes up pretty often and supporting it more fully (depending on what that means for a critical mass of users) is definitely plausible.

brimoor avatar Aug 12 '21 17:08 brimoor

Sure thing! We work in audio source separation so spectrogram visualization with masks would be useful. I didn't realize you could play back audio already. That would also be very helpful.

Another basic use case that we have is that some of our data is poorly labeled. Basically, we could have a file that is labeled as a bass, when actually it is a bass vocalist, or a bass clarinet or a synthesizer, etc. It seems like voxel51 could be very helpful for this, although I imagine that the base model that you have for generating embeddings wouldn't transfer very well to audio spectrograms.

lminer avatar Aug 12 '21 17:08 lminer

Interesting, thanks for sharing. Sounds like what you would like to see aligns well with what is already on the near term roadmap for the tool.

Regarding embeddings: I definitely think that using FiftyOne to visualize embeddings could be a really nice workflow for your project. You're right that the OOTB model that we provide for generating image/object embeddings likely wouldn't work well for spectrograms, but, do note that you can generate your own embeddings outside of FiftyOne and pass them to the relevant visualization methods. This tutorial shows an example of that.

Without thinking too much about it, I would definitely try concatenating audio + visual features into one embedding and then visualizing in FiftyOne with those. That should capture bass vs bass vocalist differences like you mentioned.

Shameless plug: the project is all open-source, so if you'd like to help get audio support released sooner, we'd be happy to loop you in to make it happen!

brimoor avatar Aug 12 '21 18:08 brimoor

Any news on supporting audio samples? It would be cool, if audio files can be labeled/viewed like in Raven Sound Analysis.

LimitlessGreen avatar Mar 05 '23 08:03 LimitlessGreen

Ping @brimoor

LimitlessGreen avatar May 23 '23 16:05 LimitlessGreen

This would be awesome!

jpaasen avatar Nov 02 '23 12:11 jpaasen