Kiran R comments

Results 29 comments of


                                            Kiran R

OnnxT5 slower than Pytorch

@JoeREISys I ran the same script in colab, I'm getting the following results. maybe it's the device issue. ``` Downloading: 100% 1.43k/1.43k [00:00

Different behaviour when extending this project to Bart

thank you! @tobigue was able to export `mbart` to onnx, he might be able to help.

Different behaviour when extending this project to Bart

cool! I also had some issues with `1.7.0`, while using the `onnxruntime==1.7.0` for quantizing it created extra models here's the [issue ](https://github.com/microsoft/onnxruntime/issues/6888). by applying `optimize_model=False` was able to fix it.

Different behaviour when extending this project to Bart

constant folding replaces some of the operations that have all constant inputs, not clear why it creates embedding twice in bart. in t5 i did not face any issue with...

Different behaviour when extending this project to Bart

also I noticed that in the notebook ```python input_names = [x.name for x in self.decoder.get_inputs()] inputs = [ input_ids.cpu().numpy(), attention_mask.cpu().numpy(), ] + [ tensor.cpu().numpy() for tensor in flat_past_key_values ] decoder_inputs...

Different behaviour when extending this project to Bart

>It only happens for the init_decoder and I saw that in fastT5 you do not do constant folding for the init decoder (only for encoder and decoder). https://github.com/Ki6an/fastT5/blob/master/fastT5/onnx_exporter.py#L196 you're right...

GPU Optimization

for GPU you can use the [`onnxruntime-gpu` ](https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#contents)library, but it does not support quantization. so you won't have the advantage of reduced model size during inference. [here's](https://github.com/microsoft/onnxruntime/blob/dfe42e185c6c6de68177db8ecf307645ce831aec/onnxruntime/python/tools/transformers/notebooks/PyTorch_Bert-Squad_OnnxRuntime_GPU.ipynb) an example implementation...

Kiran R

OnnxT5 slower than Pytorch

Different behaviour when extending this project to Bart

Different behaviour when extending this project to Bart

Different behaviour when extending this project to Bart

Different behaviour when extending this project to Bart

Different behaviour when extending this project to Bart

GPU Optimization

Mt5 model loading fails

Mt5 model loading fails

Mt5 model loading fails