Kiran R comments

Results 29 comments of


                                            Kiran R

Using Different Transformer Models

> Is there any way to use for example BERT to implement same solution? this [paper](https://www.aclweb.org/anthology/D19-5821.pdf) explains how to use the BERT for QG. But I haven't found any code...

Out of memeory error for batch size more than 1 for T5 models.

@pommedeterresautee thanks, Do you plan on adding support for the Triton server?

Out of memeory error for batch size more than 1 for T5 models.

Great! I tried T5 with cache (i.e. with `past-key-values`) on `triton server`. For generating every single token, the python backend was making lots of requests (`24 pkv + 1 logits`...

Out of memeory error for batch size more than 1 for T5 models.

Great find! Thanks for fixing the bug. Sorry for the replying late on this. As mentioned above, I'm trying to serve the T5 model from triton server. I have an...

Out of memeory error for batch size more than 1 for T5 models.

thanks for the response and tip, the execution of the onnx model part is slow ```python inference_request = pb_utils.InferenceRequest( model_name=self.model_path, requested_output_names=["logits"] + self.output_pkv_names, inputs=[input_ids, encoder_attention_mask, encoder_hidden_states] + input_past_key_values, ) inference_response...

Out of memeory error for batch size more than 1 for T5 models.

> just to be sure, you are using 2 decoders and 1 encoder, right? yes > why do you need tensor.cuda() in get_output_tensors ? It should already be on GPU....

Inferencing with T5

You are getting this error because you've set the `max_length=100` and sent `input_ids` of length 18. That's why it stops at 18%. If you provide max_length as 18 you'll get...

Unable to retrieve hidden_states

sorry for the late reply, you can get the `hidden states` of the encoder easily just by sending in the `input_ids` and `attention mask` to the encoder as shown below...

the method get_onnx_model() should not need the path to original model

yes, it is bit confusing, it should be `model_name`, not `model_name_or_path`, I'll make this change in the next update. the purpose of the `model_name` (the current `model_name_or_path`) is to select...

OnnxT5 slower than Pytorch

can you provide the device specifications and code you are using to test the speed?