FastSpeech2
FastSpeech2 copied to clipboard
Need help converting FastSpeech model to ONNX to run on Tensor RT
Hi, I have my Fastspeech model trained and working well, and I want to improve the speed by running the model on Tensor RT (maybe convert preprocess code to C++ later). Currently I am following this example to export ONNX model file: https://docs.microsoft.com/en-us/windows/ai/windows-ml/tutorials/pytorch-convert-model But I dont know how to create the dummy input Can someone help me with this, ty
Dummy inputs are tensors of the size as expected by the model but filled with either random values or zeros.
Did you manage to complete the onnx conversion? It seems that the torch.bucketize operator is not currently supported (pytorch 1.8, onnx opset = 13)
@FasoCA Yes I also have that error with bucketize, I dont remember how I fixed it but it was temporary, and I am not sure if it was right method. I have finished conversion for both Fastspeech model and vocoder model, but there is some warning because there is if-else clause inside forward class of Fastspeech model. It will not be able to trace if-else clause. The vocoder conversion is done with no error. Anyway, the whole pipeline still able to run with 2 converted models but encounter error in some special case. By far, I am not using converted Fastspeech model, just the vocoder. So my pipeline will include Fastspeech Pytorch model and HifiGAN TensorRT model. I am still using Python, consider convert to C++ later
@EuphoriaCelestial Much appreciated the reply. I've also been working exclusively in python so far.
To get around the lack of torch.bucketize support, one can write a custom onnx operator in C++ (maybe following what's described here: https://github.com/onnx/tutorials/blob/master/PyTorchCustomOperator/README.md but I have never done it), re-write in python functionally equivalent operations and swap them for bucketize, or somehow skip that section of the code entirely (if possible). Not sure which route would be best. Do you recall your solution?
Thanks for the heads-up about the if-clause. I think it's a branch on training vs. inference, correct? In which case, one could generate separate models for the two cases. Is this what you are referring to, when you talk about "2 converted models"?
Do you recall your solution?
I followed my friend's suggestion and hard fix the bucketize like below (this is the else-clause in get_pitch_embedding
and get_energy_embedding
). I dont have deep knowledge in this so this is pure trial and error, tell me if this is wrong.
prediction = prediction * control
buck = torch.zeros_like(prediction)
buck[:] = 255
buck = buck.type(torch.long)
buck.to(torch.device("cuda" if torch.cuda.is_available() else "cpu"))
embedding = self.pitch_embedding(buck)
I think it's a branch on training vs. inference, correct?
no, take a look at forward function in the model, there is many if-else clause inside, when I convert to ONNX, it say it unable to trace the data flow through them, the result maybe wrong.
In which case, one could generate separate models for the two cases. Is this what you are referring to, when you talk about "2 converted models"?
no, the 2 models I am mentioning is Fastspeech model and vocoder model (HiFiGAN or MelGAN), currently I only convert vocoder model
I followed my friend's suggestion and hard fix the bucketize like below (this is the else-clause in
get_pitch_embedding
andget_energy_embedding
). I dont have deep knowledge in this so this is pure trial and error, tell me if this is wrong.
I see, so the idea is to replace bucketize with a dummy tensor of equivalent size and type in the call to self.pitch_embedding and self.energy_embedding, when the onnx graph is generated. Makes sense, I'll give it a try, thank you!
mark
@EuphoriaCelestial
So ,did you covert to TRT successful ?
I get some problem from onnx to TRT . My error is Error Code 4: Internal Error (Network must have at least one output)
@EuphoriaCelestial So ,did you covert to TRT successful ? I get some problem from onnx to TRT . My error is
Error Code 4: Internal Error (Network must have at least one output)
sadly, no. I can make it run successful with no error pop up, but the sound generated is only contain noise, the run time is not even reduced, so it a total failure.
@EuphoriaCelestial So ,did you covert to TRT successful ? I get some problem from onnx to TRT . My error is
Error Code 4: Internal Error (Network must have at least one output)
sadly, no. I can make it run successful with no error pop up, but the sound generated is only contain noise, the run time is not even reduced, so it a total failure.
Maybe it's the precision. So can you share ur method in onnx to TRT ? I really want to figure it out . Thank you very much.
@EuphoriaCelestial I am sorry to disturb you, but I have some question .How did you solve the dynamic input in Fastspeech2 ?I give difference input ,but the output of onnx model is problematic.
It's possible convert encoder and decoder to onnx separately, but the middle VarianceAdaptor not able to integrate, does anyone sucessfully converted whole part into single onnx?
It's possible convert encoder and decoder to onnx separately, but the middle VarianceAdaptor not able to integrate, does anyone sucessfully converted whole part into single onnx?
what do you mean whole part into single onnx? covert acoustic model and vocoder into one onnx ?, or a single acoustic model ?
@Tian14267 I have converted fastspeech to onnx. Does anyone able to convert this model for tensorrt inference?
mark
@jinfagang can u show your code how to conver the model to onnx, thanks
It's possible convert encoder and decoder to onnx separately, but the middle VarianceAdaptor not able to integrate, does anyone sucessfully converted whole part into single onnx?
@lucasjinreal do you be so kindful to share your translation code and your wise?