FastSpeech2 icon indicating copy to clipboard operation
FastSpeech2 copied to clipboard

Need help converting FastSpeech model to ONNX to run on Tensor RT

Open EuphoriaCelestial opened this issue 3 years ago • 17 comments

Hi, I have my Fastspeech model trained and working well, and I want to improve the speed by running the model on Tensor RT (maybe convert preprocess code to C++ later). Currently I am following this example to export ONNX model file: https://docs.microsoft.com/en-us/windows/ai/windows-ml/tutorials/pytorch-convert-model But I dont know how to create the dummy input Can someone help me with this, ty

EuphoriaCelestial avatar Oct 08 '21 03:10 EuphoriaCelestial

Dummy inputs are tensors of the size as expected by the model but filled with either random values or zeros.

Did you manage to complete the onnx conversion? It seems that the torch.bucketize operator is not currently supported (pytorch 1.8, onnx opset = 13)

FasoCA avatar Oct 30 '21 01:10 FasoCA

@FasoCA Yes I also have that error with bucketize, I dont remember how I fixed it but it was temporary, and I am not sure if it was right method. I have finished conversion for both Fastspeech model and vocoder model, but there is some warning because there is if-else clause inside forward class of Fastspeech model. It will not be able to trace if-else clause. The vocoder conversion is done with no error. Anyway, the whole pipeline still able to run with 2 converted models but encounter error in some special case. By far, I am not using converted Fastspeech model, just the vocoder. So my pipeline will include Fastspeech Pytorch model and HifiGAN TensorRT model. I am still using Python, consider convert to C++ later

EuphoriaCelestial avatar Oct 30 '21 03:10 EuphoriaCelestial

@EuphoriaCelestial Much appreciated the reply. I've also been working exclusively in python so far.

To get around the lack of torch.bucketize support, one can write a custom onnx operator in C++ (maybe following what's described here: https://github.com/onnx/tutorials/blob/master/PyTorchCustomOperator/README.md but I have never done it), re-write in python functionally equivalent operations and swap them for bucketize, or somehow skip that section of the code entirely (if possible). Not sure which route would be best. Do you recall your solution?

Thanks for the heads-up about the if-clause. I think it's a branch on training vs. inference, correct? In which case, one could generate separate models for the two cases. Is this what you are referring to, when you talk about "2 converted models"?

FasoCA avatar Oct 30 '21 23:10 FasoCA

Do you recall your solution?

I followed my friend's suggestion and hard fix the bucketize like below (this is the else-clause in get_pitch_embedding and get_energy_embedding ). I dont have deep knowledge in this so this is pure trial and error, tell me if this is wrong.

prediction = prediction * control buck = torch.zeros_like(prediction) buck[:] = 255 buck = buck.type(torch.long) buck.to(torch.device("cuda" if torch.cuda.is_available() else "cpu")) embedding = self.pitch_embedding(buck)

EuphoriaCelestial avatar Nov 02 '21 04:11 EuphoriaCelestial

I think it's a branch on training vs. inference, correct?

no, take a look at forward function in the model, there is many if-else clause inside, when I convert to ONNX, it say it unable to trace the data flow through them, the result maybe wrong.

In which case, one could generate separate models for the two cases. Is this what you are referring to, when you talk about "2 converted models"?

no, the 2 models I am mentioning is Fastspeech model and vocoder model (HiFiGAN or MelGAN), currently I only convert vocoder model

EuphoriaCelestial avatar Nov 02 '21 04:11 EuphoriaCelestial

I followed my friend's suggestion and hard fix the bucketize like below (this is the else-clause in get_pitch_embedding and get_energy_embedding ). I dont have deep knowledge in this so this is pure trial and error, tell me if this is wrong.

I see, so the idea is to replace bucketize with a dummy tensor of equivalent size and type in the call to self.pitch_embedding and self.energy_embedding, when the onnx graph is generated. Makes sense, I'll give it a try, thank you!

FasoCA avatar Nov 02 '21 05:11 FasoCA

mark

Pydataman avatar Dec 29 '21 06:12 Pydataman

@EuphoriaCelestial So ,did you covert to TRT successful ? I get some problem from onnx to TRT . My error is Error Code 4: Internal Error (Network must have at least one output)

Tian14267 avatar Jan 14 '22 05:01 Tian14267

@EuphoriaCelestial So ,did you covert to TRT successful ? I get some problem from onnx to TRT . My error is Error Code 4: Internal Error (Network must have at least one output)

sadly, no. I can make it run successful with no error pop up, but the sound generated is only contain noise, the run time is not even reduced, so it a total failure.

EuphoriaCelestial avatar Jan 17 '22 03:01 EuphoriaCelestial

@EuphoriaCelestial So ,did you covert to TRT successful ? I get some problem from onnx to TRT . My error is Error Code 4: Internal Error (Network must have at least one output)

sadly, no. I can make it run successful with no error pop up, but the sound generated is only contain noise, the run time is not even reduced, so it a total failure.

Maybe it's the precision. So can you share ur method in onnx to TRT ? I really want to figure it out . Thank you very much.

Tian14267 avatar Jan 17 '22 09:01 Tian14267

@EuphoriaCelestial I am sorry to disturb you, but I have some question .How did you solve the dynamic input in Fastspeech2 ?I give difference input ,but the output of onnx model is problematic.

Tian14267 avatar Jan 18 '22 07:01 Tian14267

It's possible convert encoder and decoder to onnx separately, but the middle VarianceAdaptor not able to integrate, does anyone sucessfully converted whole part into single onnx?

lucasjinreal avatar Jan 21 '22 08:01 lucasjinreal

It's possible convert encoder and decoder to onnx separately, but the middle VarianceAdaptor not able to integrate, does anyone sucessfully converted whole part into single onnx?

what do you mean whole part into single onnx? covert acoustic model and vocoder into one onnx ?, or a single acoustic model ?

Tian14267 avatar Jan 25 '22 06:01 Tian14267

@Tian14267 I have converted fastspeech to onnx. Does anyone able to convert this model for tensorrt inference?

lucasjinreal avatar Jan 26 '22 03:01 lucasjinreal

mark

leslie2046 avatar Apr 10 '22 09:04 leslie2046

@jinfagang can u show your code how to conver the model to onnx, thanks

mollon650 avatar Apr 29 '22 03:04 mollon650

It's possible convert encoder and decoder to onnx separately, but the middle VarianceAdaptor not able to integrate, does anyone sucessfully converted whole part into single onnx?

@lucasjinreal do you be so kindful to share your translation code and your wise?

javileyes avatar Apr 25 '24 11:04 javileyes