File-z-J
File-z-J
> 您目前遇到的报错可以在命令中加入一个`--model_type "gpt"`的参数解决 随后出现了 [2024-03-13 17:41:08,647] [ DEBUG] - Number of trainable parameters = 1,327,104 (per device) /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp/transformers/tokenizer_utils_base.py:1925: UserWarning: Truncation was not explicitly activated but `max_length` is provided a specific...
**然后说可以这样做** from paddlenlp.transformers import GPTTokenizer # 假设tokenizer已经被正确加载 tokenizer = GPTTokenizer.from_pretrained('你的模型路径') # 对你的文本数据进行编码 encoded_inputs = tokenizer(texts, padding='max_length', # 确保所有序列长度相同 truncation=True, # 超出最大长度的部分将被截断 max_length=512) # 设定最大序列长度 **以下是我搜索得到的,但是好像并没有实现**:https://github.com/PaddlePaddle/PaddleNLP/issues/8023
**第二个问题**,基于gpt-cpm-small-cn-distill继续训练的模型生成相关的,模型训练好后有个config.json文件, 里面有个"dtype": "float16参数",如果我使用predict_generation.py文件生成,就会出现: Traceback (most recent call last): File "D:\AI\PaddleNLP\llm\gpt-3\predict_generation.py", line 165, in predict() File "D:\AI\PaddleNLP\llm\gpt-3\predict_generation.py", line 157, in predict outputs = predictor.predict(texts) File "D:\AI\PaddleNLP\llm\gpt-3\predict_generation.py", line 133, in predict infer_result...