AISHELL-4
AISHELL-4 copied to clipboard
Amount of Clean Non-Overlapped data
It looks like the amount of non-overlapped data is much smaller than the overall corpus. I am seeing less than 20 hours. Is this correct?
Thanks Michael Picheny
It looks like the amount of non-overlapped data is much smaller than the overall corpus. I am seeing less than 20 hours. Is this correct?
Thanks Michael Picheny
Thank you for your interest. Maybe our overlap calculation methods are different? Our method is: overlap length / all speech length If there are 2 speakers, everyone speaks 10s and overlaps 5s. The ratio is (5+5)/20.
I am using the methodology described in https://github.com/DanBerrebbi/AISHELL-4.git which I thought was based on your original work, but perhaps not?