Synthetic-Voice-Detection-Vocoder-Artifacts
Synthetic-Voice-Detection-Vocoder-Artifacts copied to clipboard
Clarification Needed on Intra-dataset vs Cross-dataset Evaluation Metrics in Paper
I have some questions regarding the evaluation metrics and results presented in Sections 4.4 and 4.5.
Intra-dataset Evaluation (Section 4.4)
The paper reports a very low EER of 0.19% on the WaveFake dataset using the RawNet2 model.
- To confirm my understanding, was this evaluation performed with the model being trained and tested on the same WaveFake dataset?
Cross-dataset Evaluation (Section 4.5)
On the other hand, the EER significantly increased to 26.95% when the model trained on the LibriSeVoc dataset was tested on the WaveFake dataset. This suggests poor generalization to unseen data.
- Are there any ongoing efforts to improve this aspect of the model, perhaps through domain adaptation techniques or exposure to a more diverse set of vocoders during training?
Hi Chandler, Thank you very much for the question. For your first question, we actually have split the train and test datasets on WakeFake Dateset. For your second question, that is an excellent idea. We are currently working on it, and like your ideas, we are trying to expose to a more diverse set of vocoders during training.