peoples-speech icon indicating copy to clipboard operation
peoples-speech copied to clipboard

Things to do before neurips

Open galv opened this issue 4 years ago • 1 comments

  • [ ] Create two separate datasets to distribute, one CC-BY, one CC-BY-SA.
  • [ ] Rerun yamnet on the entire dataset. This means we need to make it more performant See #40
  • [ ] Send data to be hand-transcribed.
    • [ ] Optionally, do audio-based deduplication first.
  • [ ] Add text deata deduplication to the data creation pipeline.
  • [ ] Train kaldi and/or nemo models on the dataset. Provide fixes to the dataset, based on this work. Adding more as time goes on...

galv avatar Sep 08 '21 18:09 galv

Poster + 3 minute talk due: Oct 18th

Camera-ready paper due: November 6th

Neurips (dataset release): Early December

galv avatar Sep 13 '21 16:09 galv