Bharat Venkitesh
Bharat Venkitesh
I see that there is full int8 support (both weights and activations) for BERT, its not clear to me what is supported for GPT models ([here](https://github.com/NVIDIA/FasterTransformer/blob/main/examples/pytorch/gpt/utils/parallel_gpt.py#L28)). Ideally if we can...
When calculating the log likelihood of token at position i, we should consider the logits at step i-1 and also log likelihood of starting token is undefined (can be set...
Noticed a small drop in performance (
I could not find in the doc, adding token streaming support during generation for GPT models would be great.
Versions: Tensorflow- 2.3.0-rc1 CUDA-10 TensorRT-6 I am trying to convert a GPT2 model, the saved model size is about 1.9GB. It causes an issue when I try to use TF...