PaddleNLP 使用模型推测时会出现 CUDA error(719), unspecified launch failure. 错误

请提出你的问题

当我使用训练的stf结果推测时，随机报错

Error: ../paddle/phi/kernels/gpu/embedding_kernel.cu:41 Assertion `id < N` failed. Id should smaller than 2050 but received an id value: 2050.
Error: ../paddle/phi/kernels/gpu/embedding_kernel.cu:41 Assertion `id < N` failed. Id should smaller than 2050 but received an id value: 2050.
Error: ../paddle/phi/kernels/gpu/embedding_kernel.cu:41 Assertion `id < N` failed. Id should smaller than 2050 but received an id value: 2050.
Error: ../paddle/phi/kernels/gpu/embedding_kernel.cu:41 Assertion `id < N` failed. Id should smaller than 2050 but received an id value: 2050.
Traceback (most recent call last):
  File "/hy-tmp/PaddleNLP/llm/predictor.py", line 944, in <module>
    predict()
  File "/hy-tmp/PaddleNLP/llm/predictor.py", line 888, in predict
    outputs = predictor.predict(batch_source_text)
  File "/hy-tmp/PaddleNLP/llm/predictor.py", line 183, in predict
    predictions = self._infer(tokenized_source)
  File "/usr/local/miniconda3/envs/paddlenlp/lib/python3.10/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/usr/local/miniconda3/envs/paddlenlp/lib/python3.10/site-packages/paddle/fluid/dygraph/base.py", line 347, in _decorate_function
    return func(*args, **kwargs)
  File "/hy-tmp/PaddleNLP/llm/predictor.py", line 229, in _infer
    result = self.model.generate(
  File "/usr/local/miniconda3/envs/paddlenlp/lib/python3.10/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/usr/local/miniconda3/envs/paddlenlp/lib/python3.10/site-packages/paddle/fluid/dygraph/base.py", line 347, in _decorate_function
    return func(*args, **kwargs)
  File "/usr/local/miniconda3/envs/paddlenlp/lib/python3.10/site-packages/paddlenlp/generation/utils.py", line 941, in generate
    return self.sample(
  File "/usr/local/miniconda3/envs/paddlenlp/lib/python3.10/site-packages/paddlenlp/generation/utils.py", line 1141, in sample
    outputs = self(**model_inputs)
  File "/usr/local/miniconda3/envs/paddlenlp/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1254, in __call__
    return self.forward(*inputs, **kwargs)
  File "/usr/local/miniconda3/envs/paddlenlp/lib/python3.10/site-packages/paddlenlp/transformers/opt/modeling.py", line 1058, in forward
    outputs = self.opt(
  File "/usr/local/miniconda3/envs/paddlenlp/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1254, in __call__
    return self.forward(*inputs, **kwargs)
  File "/usr/local/miniconda3/envs/paddlenlp/lib/python3.10/site-packages/paddlenlp/transformers/opt/modeling.py", line 914, in forward
    attention_mask = self._prepare_decoder_attention_mask(attention_mask, input_shape, past_key_values_length)
  File "/usr/local/miniconda3/envs/paddlenlp/lib/python3.10/site-packages/paddlenlp/transformers/opt/modeling.py", line 790, in _prepare_decoder_attention_mask
    if input_shape[-1] > 1:
OSError: (External) CUDA error(719), unspecified launch failure. 
  [Hint: 'cudaErrorLaunchFailure'. An exception occurred on the device while executing a kernel. Common causes include dereferencing an invalid device pointerand accessing out of bounds shared memory. Less common cases can be system specific - more information about these cases canbe found in the system specific user guide. This leaves the process in an inconsistent state and any further CUDA work willreturn the same error. To continue using CUDA, the process must be terminated and relaunched.] (at ../paddle/phi/backends/gpu/gpu_context.cc:544)

我的推测命令

python predictor.py \
    --model_name_or_path ./checkpoints/opt_sft_ckpts_125m \
    --data_file ./data/tuice_paddlenlp_part-00000.json \
    --dtype float16  \
    --batch_size 80  \
    --output_file ./predictor_out/opt_125m_sft_pdnlp_part-00000.json

使用的是 opt_125m 经过 stf训练得到的模型