bhsueh_NV comments

Results 639 comments of


                                            bhsueh_NV

Cannot do inference for any model on more than two nodes

Close this bug because it is inactivated. Feel free to re-open this issue if you still have any problem.

Run and query a finetuned / custom T5 model in Triton Inference Server

1. It is possible. When you convert the model successfully, the model configuration will be saved in a config file and the backend will read it, like [here](https://github.com/triton-inference-server/fastertransformer_backend/blob/main/all_models/t5/fastertransformer/1/config.ini). 2. Please...

Run and query a finetuned / custom T5 model in Triton Inference Server

Close this bug because it is inactivated. Feel free to re-open this issue if you still have any problem.

Support for mbart models?/ Could we get the output logits of decoders before the beam search?

It requires to modify the source codes.

Support for mbart models?/ Could we get the output logits of decoders before the beam search?

If you use decoder op, then the output is the results of transformer block. If you use decoding op, then you need to modify the op and FT source codes.

what is the compute capability of 3090ti?

8.6

what is the compute capability of 3090ti?

We don't verify on all possible GPUs, but it should work. If you encounter error, it is welcome to file a bug here.

what is the compute capability of 3090ti?

Close this bug because it is inactivated. Feel free to re-open this issue if you still have any problem.

Support model weights and calculations in bfloat16

Currently, we don't find obvious drop when running FP16 on BF16 model. We also support BF16 supporting on GPT, you can save model as FP32 and run inference on bf16....

Support model weights and calculations in bfloat16

bfloat16 calculation is supported in most model in latest release. Because we don't have good way to save the bfloat16 weight now, so you still need to store the model...