LPCNet icon indicating copy to clipboard operation
LPCNet copied to clipboard

Merge multi-band linear prediction into LPCNet

Open GuangChen2016 opened this issue 4 years ago • 15 comments

As reported in https://arxiv.org/pdf/2005.05551.pdf, the multi-band method could enable the vocoder to generate several speech samples in parallel at one step, thus significantly improve the efficiency of speech synthesis. I met some problem when mering it into LPCNet. Anyone tried to merge the multi-band linear prediction into LPCNet?

GuangChen2016 avatar Jul 15 '20 11:07 GuangChen2016

@GuangChen2016 Could you mind share what kind of problem you have met when implementing Mulit-band LPCNet?

shangqwe123 avatar Jul 28 '20 09:07 shangqwe123

Did you applied the multi-band into LPCNet successfully? My generated wavs are wrong and I doubt that the correctness of my calculation for lpc coefficient. Could you mind leave an email or other contact information, I can share that with you.

GuangChen2016 avatar Jul 29 '20 14:07 GuangChen2016

@GuangChen2016

Could you explain that how you extract "M order linear prediction coefficients of each sub frequency band"(multi-band linear prediction coeff.)?

And, Do you using 80dim mel-spectrogram to compute LPC? (using python lib something like librosa?) I think LPC in original LPCNet code extracted from BFCC(18-dim). (BFCC -> PSD -> Autocorrelation -> LPC)

ohleo avatar Aug 11 '20 02:08 ohleo

Did you applied the multi-band into LPCNet successfully? My generated wavs are wrong and I doubt that the correctness of my calculation for lpc coefficient. Could you mind leave an email or other contact information, I can share that with you.

I ve try it, but got impluses. How do deal with that? And my 2th band is bad(impluses), but others work well.

zhuxiaoxuhit avatar Sep 18 '20 10:09 zhuxiaoxuhit

@zhuxiaoxuhit Yeah, I have sent an email to you.

GuangChen2016 avatar Sep 25 '20 06:09 GuangChen2016

@GuangChen2016 Is your multi-band lpcnet effective? I recently tried to experiment with pytorch, but no effect

yanggeng1995 avatar Dec 12 '20 13:12 yanggeng1995

@GuangChen2016 Is your multi-band lpcnet effective? I recently tried to experiment with pytorch, but no effect

Did u apply this paper? https://indico2.conference4me.psnc.pl/event/35/contributions/3023/attachments/694/732/Thu-1-1-6.pdf. It will be faster than featherwave I think.

zhuxiaoxuhit avatar Dec 18 '20 06:12 zhuxiaoxuhit

@GuangChen2016 Is your multi-band lpcnet effective? I recently tried to experiment with pytorch, but no effect

Did u apply this paper? https://indico2.conference4me.psnc.pl/event/35/contributions/3023/attachments/694/732/Thu-1-1-6.pdf. It will be faster than featherwave I think.

I have achieved initial success in the featherwave, but there are individual samples that are unstable. I initially suspected that it was a problem with the mel-spec calculation subband lpc, but I have not found the problem yet.

yanggeng1995 avatar Dec 19 '20 05:12 yanggeng1995

@yanggeng1995 Ask for help! For multiband-lpcnet, the feature of each band should be extracted seperatedly, right? If I used the original features in lpcnet, I should first get subband-wavs(4 bands), then extract features for each subband-wav seperately, then all features 55 x 4? I should also change my acoustic model to predict the 55*4 features?

Another quesiton, how to get subband lpcs using mels?

Liujingxiu23 avatar Dec 10 '21 09:12 Liujingxiu23

@yanggeng1995 Ask for help! For multiband-lpcnet, the feature of each band should be extracted seperatedly, right? If I used the original features in lpcnet, I should first get subband-wavs(4 bands), then extract features for each subband-wav seperately, then all features 55 x 4? I should also change my acoustic model to predict the 55*4 features?

Another quesiton, how to get subband lpcs using mels?

No, no, it’s not like that. The LPC of the subband is calculated by the parameters of the full band. In fact, you can use the LPC of the full band as the shared LPC for multiple subbands, so you only need to change the structure of the lpcnet to predict multiple sub-bands at the same time, and the acoustic model still predict 55-dimensional full-band parameters

yanggeng1995 avatar Dec 13 '21 02:12 yanggeng1995

@yanggeng1995 I tried the way as you said, all the four subbands share the same lpc of the fullband. But in the stage of dump data, I mean "write_audio" in dump_data.c, the computation of "p" as follows: for (j=0;j<LPC_ORDER;j++) p -= st->features[k][2*NB_BANDS+3+j]st->sig_mem[j]; st->features[k][2NB_BANDS+3+j]*st : the feature of fullband sig_mem[j]: the pcms of the subband

the p > 32768 sometimes.

You did not meet similar problem? Or I did anything in the wrong way?

Liujingxiu23 avatar Dec 13 '21 02:12 Liujingxiu23

@yanggeng1995 I tried the way as you said, all the four subbands share the same lpc of the fullband. But in the stage of dump data, I mean "write_audio" in dump_data.c, the computation of "p" as follows: for (j=0;j<LPC_ORDER;j++) p -= st->features[k][2*NB_BANDS+3+j]_st->sig_mem[j]; st->features[k][2_NB_BANDS+3+j]*st : the feature of fullband sig_mem[j]: the pcms of the subband

the p > 32768 sometimes.

You did not meet similar problem? Or I did anything in the wrong way?

Is this calculation for each subband? If so, do you perform pre-emphasis on the full-band audio? If you use shared lpc, you need to perform pre-emphasis on the full-band audio data. It is best to normalize wav by the way, otherwise it is prone to p>32768.

yanggeng1995 avatar Dec 13 '21 03:12 yanggeng1995

@yanggeng1995 Yes, I use pre-emphasis on the full-band audio. But what do you mean "It is best to normalize wav by the way,"?

Liujingxiu23 avatar Dec 13 '21 03:12 Liujingxiu23

@yanggeng1995 Yes, I use pre-emphasis on the full-band audio. But what do you mean "It is best to normalize wav by the way,"?

How much is your pre-emphasis factor set? In my experiments, I found that 0.6 is a better choice; I mean you can normalize the audio, but this effect is small, and the effect of pre-emphasis is greater.

yanggeng1995 avatar Dec 14 '21 00:12 yanggeng1995

@yanggeng1995 I used the original parameter, 0.85 for pre-emphasis. I will change it to 0.6 and also check my process of the data to see if anything wrong. Thank you so much for your reply.

Liujingxiu23 avatar Dec 14 '21 01:12 Liujingxiu23