audio-datasets icon indicating copy to clipboard operation
audio-datasets copied to clipboard

open-source audio datasets

Open-source Audio Datasets

audio-catalog

Hacktoberfest is a month-long virtual festival of open source! Participants are giving back to the community by completing pull requests, participating in events, and donating to open-source projects. This project is part of Hacktoberfest 2021, where participants enrich the open-source audio datasets hosted on DagsHub.

Quick Start to Contribution

What does the DagsHub community contribute?

This year we'd like to focus our contribution on the audio domain. For that, we added audio data catalog capabilities to DagsHub! You can now upload audio files to DagsHub and see its spectrogram, wave, and even listen to it! You can see a vivid example of this (extremely cool) feature in our Librispeech-ASR-corpus project.

audio-catalog

To help audio practitioners leverage this new feature, we want to enrich open-source audio datasets on DagsHub. This is where you can contribute to the data science community!

How to contribute?

  • Claim the dataset you wish to contribute from the list (KUDOS to jim-schwoebel) by opening a new issue on the GitHub repository and name it after the dataset. Please make sure that the dataset wasn't claimed.
  • Open a new DagsHub repository and upload the data to its DVC storage (e.g., dataset repository).
  • Write information about the dataset in the README file (e.g., Librispeech ASR corpus README).
  • Add relevant tags to the repository and files.
  • Add the following labels to the repository:
    • dataset
    • audio
    • hacktoberfest
  • In the GitHub audio-datasets project:
    • Open a new branch named after the dataset.
    • Add a directory named after the dataset with the README file.
    • Commit and push the changes to GitHub.
    • Create a pull request on GitHub.
  • Optional: Share the project on DagsHub Hacktoberfest 2021 Discord channel.