charsiu icon indicating copy to clipboard operation
charsiu copied to clipboard

wav2vec2-fs model for chinese alignment?

Open funnymdzz opened this issue 1 year ago • 5 comments

Hello, from your paper, it seems that the W2V2-FS‘s alignment is better than the W2V2-FC's, but now there is English W2V2_FS model only . Have you tested the W2V2-FS alignment Chinese? If you don't have a test, I'd like to train one to test also I would like to know the specific steps of training. I have read your training code but I don't know what dataset I should used. I'd like use W2V2-FS to replace the MFA (sorry for my bad English)

funnymdzz avatar Aug 01 '23 05:08 funnymdzz

Sorry for the late reply. I have tested on Chinese but only on forced aligned data not human labeled data, because it is extremely difficult to collect such human labeled data that is publicly available.

The original training data for English can be found here and here. I've lost the Chinese data after graduation from UMich. However, you can create such training data by using MFA or FS model using data from OpenSLR. You can also use these data to get started.

lingjzhu avatar Aug 17 '23 17:08 lingjzhu

Sorry for the late reply. I have tested on Chinese but only on forced aligned data not human labeled data, because it is extremely difficult to collect such human labeled data that is publicly available.

The original training data for English can be found here and here. I've lost the Chinese data after graduation from UMich. However, you can create such training data by using MFA or FS model using data from OpenSLR. You can also use these data to get started.

Thanks for your reply I am trying to train the attention_aligner (wav2vec2-fs not wav2vec2-fc) I have trained the required BertMaskedLM but I dont konw this: weights = torch.load('. /models/neural_attention_aligner_forwardsum_10ms_true_quantizer.pt').state_dict() from here It looks like it will load neural_attention_aligner_forwardsum_10ms_true_quantizer.pt but I don't have that weight so I'm wondering what this is.

funnymdzz avatar Aug 28 '23 18:08 funnymdzz

Sorry for the late reply. I have tested on Chinese but only on forced aligned data not human labeled data, because it is extremely difficult to collect such human labeled data that is publicly available. The original training data for English can be found here and here. I've lost the Chinese data after graduation from UMich. However, you can create such training data by using MFA or FS model using data from OpenSLR. You can also use these data to get started.

Thanks for your reply I am trying to train the attention_aligner (wav2vec2-fs not wav2vec2-fc) I have trained the required BertMaskedLM but I dont konw this: weights = torch.load('. /models/neural_attention_aligner_forwardsum_10ms_true_quantizer.pt').state_dict() from here It looks like it will load neural_attention_aligner_forwardsum_10ms_true_quantizer.pt but I don't have that weight so I'm wondering what this is.

Oh sorry for the insufficient documentation. The quantizer is the quantizer for wav2vec2. I just loaded the weights of the wav2vec2 quantizer here. For Mandarin Chinese, there might not be a pretrained model. So you might need to train it from scratch.

lingjzhu avatar Aug 30 '23 21:08 lingjzhu

Sorry for the late reply. I have tested on Chinese but only on forced aligned data not human labeled data, because it is extremely difficult to collect such human labeled data that is publicly available. The original training data for English can be found here and here. I've lost the Chinese data after graduation from UMich. However, you can create such training data by using MFA or FS model using data from OpenSLR. You can also use these data to get started.

Thanks for your reply I am trying to train the attention_aligner (wav2vec2-fs not wav2vec2-fc) I have trained the required BertMaskedLM but I dont konw this: weights = torch.load('. /models/neural_attention_aligner_forwardsum_10ms_true_quantizer.pt').state_dict() from here It looks like it will load neural_attention_aligner_forwardsum_10ms_true_quantizer.pt but I don't have that weight so I'm wondering what this is.

Oh sorry for the insufficient documentation. The quantizer is the quantizer for wav2vec2. I just loaded the weights of the wav2vec2 quantizer here. For Mandarin Chinese, there might not be a pretrained model. So you might need to train it from scratch.

Thanks for your work and reply. I notice that you have loaded the weights of the wav2vec2 quantizer. but i can not find it,where is the quantizer path?

startreker-shzy avatar Oct 11 '23 10:10 startreker-shzy

Hello, from your paper, it seems that the W2V2-FS‘s alignment is better than the W2V2-FC's, but now there is English W2V2_FS model only . Have you tested the W2V2-FS alignment Chinese? If you don't have a test, I'd like to train one to test also I would like to know the specific steps of training. I have read your training code but I don't know what dataset I should used. I'd like use W2V2-FS to replace the MFA (sorry for my bad English)

I just try to use W2V2-FS to replace the MFA. Any progress do you have? What about the performance on Chinese data?

startreker-shzy avatar Oct 11 '23 10:10 startreker-shzy