FasterTransformer
FasterTransformer copied to clipboard
[Bugfix] GptJ & GptNeoX batch inference error
trafficstars
GptJ & GptNeoX may generate random outputs when using batch inference mode and no prefix prompt. The problem is caused by the nullptr check in https://github.com/NVIDIA/FasterTransformer/blob/f8e42aac45815c5be92c0915b12b9a6652386e8c/src/fastertransformer/kernels/gpt_kernels.cu#L1064
I think this is a duplicate solution of #716 which is more elegant and efficient.