Results 4 comments of Hui Chen

I encountered the same problem on deepspeed version 0.14.2.

Have you tried to add ```predictions = np.argmax(predictions, axis=-1)``` before decoding? Current prediction shape looks like (batch_size, length, vocabulary_size). We should require a shape like (batch_size, length).

Hi @JiuhaiChen , I also encountered this question. Did you solve it?