charsiu
charsiu copied to clipboard
wav2vec2-fs model for chinese alignment?
Hello, from your paper, it seems that the W2V2-FS‘s alignment is better than the W2V2-FC's, but now there is English W2V2_FS model only . Have you tested the W2V2-FS alignment Chinese? If you don't have a test, I'd like to train one to test also I would like to know the specific steps of training. I have read your training code but I don't know what dataset I should used. I'd like use W2V2-FS to replace the MFA (sorry for my bad English)
Sorry for the late reply. I have tested on Chinese but only on forced aligned data not human labeled data, because it is extremely difficult to collect such human labeled data that is publicly available.
The original training data for English can be found here and here. I've lost the Chinese data after graduation from UMich. However, you can create such training data by using MFA or FS model using data from OpenSLR. You can also use these data to get started.
Sorry for the late reply. I have tested on Chinese but only on forced aligned data not human labeled data, because it is extremely difficult to collect such human labeled data that is publicly available.
The original training data for English can be found here and here. I've lost the Chinese data after graduation from UMich. However, you can create such training data by using MFA or FS model using data from OpenSLR. You can also use these data to get started.
Thanks for your reply I am trying to train the attention_aligner (wav2vec2-fs not wav2vec2-fc) I have trained the required BertMaskedLM but I dont konw this:
weights = torch.load('. /models/neural_attention_aligner_forwardsum_10ms_true_quantizer.pt').state_dict()
from here
It looks like it will load neural_attention_aligner_forwardsum_10ms_true_quantizer.pt but I don't have that weight so I'm wondering what this is.
Sorry for the late reply. I have tested on Chinese but only on forced aligned data not human labeled data, because it is extremely difficult to collect such human labeled data that is publicly available. The original training data for English can be found here and here. I've lost the Chinese data after graduation from UMich. However, you can create such training data by using MFA or FS model using data from OpenSLR. You can also use these data to get started.
Thanks for your reply I am trying to train the attention_aligner (wav2vec2-fs not wav2vec2-fc) I have trained the required BertMaskedLM but I dont konw this:
weights = torch.load('. /models/neural_attention_aligner_forwardsum_10ms_true_quantizer.pt').state_dict()
from here It looks like it will load neural_attention_aligner_forwardsum_10ms_true_quantizer.pt but I don't have that weight so I'm wondering what this is.
Oh sorry for the insufficient documentation. The quantizer is the quantizer for wav2vec2. I just loaded the weights of the wav2vec2 quantizer here. For Mandarin Chinese, there might not be a pretrained model. So you might need to train it from scratch.
Sorry for the late reply. I have tested on Chinese but only on forced aligned data not human labeled data, because it is extremely difficult to collect such human labeled data that is publicly available. The original training data for English can be found here and here. I've lost the Chinese data after graduation from UMich. However, you can create such training data by using MFA or FS model using data from OpenSLR. You can also use these data to get started.
Thanks for your reply I am trying to train the attention_aligner (wav2vec2-fs not wav2vec2-fc) I have trained the required BertMaskedLM but I dont konw this:
weights = torch.load('. /models/neural_attention_aligner_forwardsum_10ms_true_quantizer.pt').state_dict()
from here It looks like it will load neural_attention_aligner_forwardsum_10ms_true_quantizer.pt but I don't have that weight so I'm wondering what this is.Oh sorry for the insufficient documentation. The quantizer is the quantizer for wav2vec2. I just loaded the weights of the wav2vec2 quantizer here. For Mandarin Chinese, there might not be a pretrained model. So you might need to train it from scratch.
Thanks for your work and reply. I notice that you have loaded the weights of the wav2vec2 quantizer. but i can not find it,where is the quantizer path?
Hello, from your paper, it seems that the W2V2-FS‘s alignment is better than the W2V2-FC's, but now there is English W2V2_FS model only . Have you tested the W2V2-FS alignment Chinese? If you don't have a test, I'd like to train one to test also I would like to know the specific steps of training. I have read your training code but I don't know what dataset I should used. I'd like use W2V2-FS to replace the MFA (sorry for my bad English)
I just try to use W2V2-FS to replace the MFA. Any progress do you have? What about the performance on Chinese data?