LPCNet
LPCNet copied to clipboard
Merge multi-band linear prediction into LPCNet
As reported in https://arxiv.org/pdf/2005.05551.pdf, the multi-band method could enable the vocoder to generate several speech samples in parallel at one step, thus significantly improve the efficiency of speech synthesis. I met some problem when mering it into LPCNet. Anyone tried to merge the multi-band linear prediction into LPCNet?
@GuangChen2016 Could you mind share what kind of problem you have met when implementing Mulit-band LPCNet?
Did you applied the multi-band into LPCNet successfully? My generated wavs are wrong and I doubt that the correctness of my calculation for lpc coefficient. Could you mind leave an email or other contact information, I can share that with you.
@GuangChen2016
Could you explain that how you extract "M order linear prediction coefficients of each sub frequency band"(multi-band linear prediction coeff.)?
And, Do you using 80dim mel-spectrogram to compute LPC? (using python lib something like librosa?) I think LPC in original LPCNet code extracted from BFCC(18-dim). (BFCC -> PSD -> Autocorrelation -> LPC)
Did you applied the multi-band into LPCNet successfully? My generated wavs are wrong and I doubt that the correctness of my calculation for lpc coefficient. Could you mind leave an email or other contact information, I can share that with you.
I ve try it, but got impluses. How do deal with that? And my 2th band is bad(impluses), but others work well.
@zhuxiaoxuhit Yeah, I have sent an email to you.
@GuangChen2016 Is your multi-band lpcnet effective? I recently tried to experiment with pytorch, but no effect
@GuangChen2016 Is your multi-band lpcnet effective? I recently tried to experiment with pytorch, but no effect
Did u apply this paper? https://indico2.conference4me.psnc.pl/event/35/contributions/3023/attachments/694/732/Thu-1-1-6.pdf. It will be faster than featherwave I think.
@GuangChen2016 Is your multi-band lpcnet effective? I recently tried to experiment with pytorch, but no effect
Did u apply this paper? https://indico2.conference4me.psnc.pl/event/35/contributions/3023/attachments/694/732/Thu-1-1-6.pdf. It will be faster than featherwave I think.
I have achieved initial success in the featherwave, but there are individual samples that are unstable. I initially suspected that it was a problem with the mel-spec calculation subband lpc, but I have not found the problem yet.
@yanggeng1995 Ask for help! For multiband-lpcnet, the feature of each band should be extracted seperatedly, right? If I used the original features in lpcnet, I should first get subband-wavs(4 bands), then extract features for each subband-wav seperately, then all features 55 x 4? I should also change my acoustic model to predict the 55*4 features?
Another quesiton, how to get subband lpcs using mels?
@yanggeng1995 Ask for help! For multiband-lpcnet, the feature of each band should be extracted seperatedly, right? If I used the original features in lpcnet, I should first get subband-wavs(4 bands), then extract features for each subband-wav seperately, then all features 55 x 4? I should also change my acoustic model to predict the 55*4 features?
Another quesiton, how to get subband lpcs using mels?
No, no, it’s not like that. The LPC of the subband is calculated by the parameters of the full band. In fact, you can use the LPC of the full band as the shared LPC for multiple subbands, so you only need to change the structure of the lpcnet to predict multiple sub-bands at the same time, and the acoustic model still predict 55-dimensional full-band parameters
@yanggeng1995 I tried the way as you said, all the four subbands share the same lpc of the fullband. But in the stage of dump data, I mean "write_audio" in dump_data.c, the computation of "p" as follows: for (j=0;j<LPC_ORDER;j++) p -= st->features[k][2*NB_BANDS+3+j]st->sig_mem[j]; st->features[k][2NB_BANDS+3+j]*st : the feature of fullband sig_mem[j]: the pcms of the subband
the p > 32768 sometimes.
You did not meet similar problem? Or I did anything in the wrong way?
@yanggeng1995 I tried the way as you said, all the four subbands share the same lpc of the fullband. But in the stage of dump data, I mean "write_audio" in dump_data.c, the computation of "p" as follows: for (j=0;j<LPC_ORDER;j++) p -= st->features[k][2*NB_BANDS+3+j]_st->sig_mem[j]; st->features[k][2_NB_BANDS+3+j]*st : the feature of fullband sig_mem[j]: the pcms of the subband
the p > 32768 sometimes.
You did not meet similar problem? Or I did anything in the wrong way?
Is this calculation for each subband? If so, do you perform pre-emphasis on the full-band audio? If you use shared lpc, you need to perform pre-emphasis on the full-band audio data. It is best to normalize wav by the way, otherwise it is prone to p>32768.
@yanggeng1995 Yes, I use pre-emphasis on the full-band audio. But what do you mean "It is best to normalize wav by the way,"?
@yanggeng1995 Yes, I use pre-emphasis on the full-band audio. But what do you mean "It is best to normalize wav by the way,"?
How much is your pre-emphasis factor set? In my experiments, I found that 0.6 is a better choice; I mean you can normalize the audio, but this effect is small, and the effect of pre-emphasis is greater.
@yanggeng1995 I used the original parameter, 0.85 for pre-emphasis. I will change it to 0.6 and also check my process of the data to see if anything wrong. Thank you so much for your reply.