PaddleNLP
PaddleNLP copied to clipboard
[Question]: 微调百川2大模型,调用resize_token_embeddings的时候报错,请问如何更改源码以复用这个功能?
请提出你的问题
需求背景:添加新token,全量微调百川2大模型,我先add_tokens,然后调用resize_token_embeddings的时候报错,报错原因是因为Old embeddings are of type <class 'paddle.distributed.fleet.layers.mpu.mp_layers.VocabParallelEmbedding'>, which is not an instance of <class 'paddle.nn.layer.common.Embedding'>,我直接把相应raise代码注释,然后再跑的时候就报下面这个错了,主要是old_embeddings没有_padding_idx,这个能直接写死一个数吗,或者可以兼容一下百川2这种类型不一致的模型吗?
Traceback (most recent call last):
File "/root/paddlejob/workspace/wanghao81/env/PaddleNLP/llm/finetune_generation.py", line 631, in <module>
main()
File "/root/paddlejob/workspace/wanghao81/env/PaddleNLP/llm/finetune_generation.py", line 478, in main
model.resize_token_embeddings(len(tokenizer))
File "/root/paddlejob/workspace/wanghao81/env/PaddleNLP/paddle-env/lib/python3.10/site-packages/paddlenlp/transformers/model_utils.py", line 1327, in resize_token_embeddings
new_embeddings = self._get_resized_embeddings(old_embeddings, new_num_tokens)
File "/root/paddlejob/workspace/wanghao81/env/PaddleNLP/paddle-env/lib/python3.10/site-packages/paddlenlp/transformers/model_utils.py", line 1395, in _get_resized_embeddings
padding_idx=old_embeddings._padding_idx,
File "/root/paddlejob/workspace/wanghao81/env/PaddleNLP/paddle-env/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1657, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'VocabParallelEmbedding' object has no attribute '_padding_idx'
https://github.com/PaddlePaddle/PaddleNLP/blob/develop/llm/finetune_generation.py#L473
具体做法是在这里添加了以下代码:
add_tokens = ["xxx", "9510727369170435113", "9818903255735709281", "3812371869983320904"]
tokenizer.add_tokens(add_tokens)
# model.get_output_embeddings().bias = None # 兼容
model.resize_token_embeddings(len(tokenizer))
补充,好像不止是百川2大模型是这样的问题,llama2系列同样是相同的问题,都是Old embeddings are of type <class 'paddle.distributed.fleet.layers.mpu.mp_layers.VocabParallelEmbedding'>, which is not an instance of <class 'paddle.nn.layer.common.Embedding'>,看起来是个通用的case,辛苦看下有没有通用的解决办法
抱歉,这个 Tensor Parallel 的模型没有适配 resize_token_embeddings。您可以使用单卡训练
This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。