PaddleNLP
PaddleNLP copied to clipboard
[Question]: gpt3的lora训练问题
请提出你的问题
0. 环境,飞浆平台 PaddlePaddle 2.6.0 !pip3 install --pre --upgrade paddlenlp -f https://www.paddlepaddle.org.cn/whl/paddlenlp.html !pip install tool_helpers visualdl==2.5.3 !pip install --upgrade paddlepaddle-gpu !pip install rouge !pip install regex 模型为: 基于gpt-cpm-small-cn-distill继续训练的模型。
1.在飞浆平台上训练,出现如下问题
[2024-03-08 17:37:31,969] [ WARNING] - Process rank: -1, device: gpu, world_size: 1, distributed training: False, 16-bits training: True
Traceback (most recent call last):
File "/home/aistudio/PaddleNLP/llm/gpt-3/finetune_generation.py", line 250, in
--model_name_or_path output/gpt3_hybrid/checkpoint-158000
--output_dir "outputlora/$task_name"
--per_device_train_batch_size 2
--per_device_eval_batch_size 1
--tensor_parallel_degree 1
--pipeline_parallel_degree 1
--fp16
--fp16_opt_level "O2"
--scale_loss 1024
--learning_rate 3e-4
--max_steps 10000
--save_steps 5000
--weight_decay 0.01
--warmup_ratio 0.01
--max_grad_norm 1.0
--logging_steps 1
--dataloader_num_workers 1
--sharding "stage2"
--eval_steps 1000
--report_to "visualdl"
--disable_tqdm true
--recompute 1
--gradient_accumulation_steps 2
--do_train
--do_eval
--device "gpu"
--lora
除了以下两个参数基本都是拷贝文档的命令,没修改过里面的代码。
--model_name_or_path output/gpt3_hybrid/checkpoint-158000
--output_dir "outputlora/$task_name" \
2.如果能够正常训练lora,我应该如何使用呢?在那里可以看到或者有具体的示例代码。
您目前遇到的报错可以在命令中加入一个--model_type "gpt"
的参数解决
您目前遇到的报错可以在命令中加入一个
--model_type "gpt"
的参数解决
随后出现了
[2024-03-13 17:41:08,647] [ DEBUG] - Number of trainable parameters = 1,327,104 (per device)
/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp/transformers/tokenizer_utils_base.py:1925: UserWarning: Truncation was not explicitly activated but max_length
is provided a specific value, please use truncation=True
to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to truncation
.
warnings.warn(
Building prefix dict from the default dictionary ...
[2024-03-13 17:41:08,713] [ DEBUG] init.py:113 - Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
[2024-03-13 17:41:08,714] [ DEBUG] init.py:132 - Loading model from cache /tmp/jieba.cache
Loading model cost 1.149 seconds.
[2024-03-13 17:41:09,862] [ DEBUG] init.py:164 - Loading model cost 1.149 seconds.
Prefix dict has been built successfully.
[2024-03-13 17:41:09,862] [ DEBUG] init.py:166 - Prefix dict has been built successfully.
/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp/transformers/tokenizer_utils_base.py:1954: UserWarning: max_length
is ignored when padding
=True
and there is no truncation strategy. To pad to max length, use padding='max_length'
.
warnings.warn(
[2024-03-13 17:41:09,887] [ ERROR] - Using pad_token, but it is not set yet.
Exception in thread Thread-2 (_thread_loop):
Traceback (most recent call last):
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
Traceback (most recent call last):
File "/home/aistudio/PaddleNLP/llm/gpt-3/finetune_generation.py", line 250, in
ValueError File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/io/dataloader/dataloader_iter.py", line 825, in next
: DataLoader worker(0) caught ValueError with message:
Traceback (most recent call last):
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/io/dataloader/worker.py", line 372, in _worker_loop
batch = fetcher.fetch(indices)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/io/dataloader/fetcher.py", line 85, in fetch
data = self.collate_fn(data)
File "/home/aistudio/PaddleNLP/llm/gpt-3/utils.py", line 314, in call
batch = self.tokenizer.pad(
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp/transformers/tokenizer_utils_base.py", line 2717, in pad
padding_strategy, _, max_length, _ = self._get_padding_truncation_strategies(
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp/transformers/tokenizer_utils_base.py", line 2026, in _get_padding_truncation_strategies
raise ValueError(
ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as pad_token
(tokenizer.pad_token = tokenizer.eos_token e.g.)
or add a new pad token via tokenizer.add_special_tokens({'pad_token': '[PAD]'})
.
self._reader.read_next_list()[0]
SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception.
[Hint: Expected killed_ != true, but received killed_:1 == true:1.] (at /paddle/paddle/fluid/operators/reader/blocking_queue.h:175)
我应该如何修改呀哥,ai是这样说的
但是我不知道咋改。或者说并不是图片中说的问题,望指点指点。
然后说可以这样做 from paddlenlp.transformers import GPTTokenizer
假设tokenizer已经被正确加载
tokenizer = GPTTokenizer.from_pretrained('你的模型路径')
对你的文本数据进行编码
encoded_inputs = tokenizer(texts, padding='max_length', # 确保所有序列长度相同 truncation=True, # 超出最大长度的部分将被截断 max_length=512) # 设定最大序列长度 以下是我搜索得到的,但是好像并没有实现:https://github.com/PaddlePaddle/PaddleNLP/issues/8023
第二个问题,基于gpt-cpm-small-cn-distill继续训练的模型生成相关的,模型训练好后有个config.json文件,
里面有个"dtype": "float16参数",如果我使用predict_generation.py文件生成,就会出现:
Traceback (most recent call last):
File "D:\AI\PaddleNLP\llm\gpt-3\predict_generation.py", line 165, in
This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。
请问您的paddle和paddlenlp的版本是多少?
This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。
This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。