voxceleb_unsupervised
voxceleb_unsupervised copied to clipboard
Augmentation adversarial training for self-supervised speaker recognition
Hi, thank you for your amazing work. I'm wondering whether there is an instruction for loading both the image face frames as well as the speech segments.
def gen_echo(ref, rir, filterGain): rir = numpy.multiply(rir, pow(10, 0.1 * filterGain)) echo = signal.convolve(ref, rir, mode='full')[:len(ref)] return echo in this function, rir data type is float32, but ref data type...
Thanks for the perfect job. Can you describe how to get 1000 pre-computed RIR filters specifically?