PaddleNLP icon indicating copy to clipboard operation
PaddleNLP copied to clipboard

[Question]: 微调百川2大模型,调用resize_token_embeddings的时候报错,请问如何更改源码以复用这个功能?

Open wanghao19970205 opened this issue 1 year ago • 4 comments

请提出你的问题

需求背景:添加新token,全量微调百川2大模型,我先add_tokens,然后调用resize_token_embeddings的时候报错,报错原因是因为Old embeddings are of type <class 'paddle.distributed.fleet.layers.mpu.mp_layers.VocabParallelEmbedding'>, which is not an instance of <class 'paddle.nn.layer.common.Embedding'>,我直接把相应raise代码注释,然后再跑的时候就报下面这个错了,主要是old_embeddings没有_padding_idx,这个能直接写死一个数吗,或者可以兼容一下百川2这种类型不一致的模型吗?

Traceback (most recent call last):
  File "/root/paddlejob/workspace/wanghao81/env/PaddleNLP/llm/finetune_generation.py", line 631, in <module>
    main()
  File "/root/paddlejob/workspace/wanghao81/env/PaddleNLP/llm/finetune_generation.py", line 478, in main
    model.resize_token_embeddings(len(tokenizer))
  File "/root/paddlejob/workspace/wanghao81/env/PaddleNLP/paddle-env/lib/python3.10/site-packages/paddlenlp/transformers/model_utils.py", line 1327, in resize_token_embeddings
    new_embeddings = self._get_resized_embeddings(old_embeddings, new_num_tokens)
  File "/root/paddlejob/workspace/wanghao81/env/PaddleNLP/paddle-env/lib/python3.10/site-packages/paddlenlp/transformers/model_utils.py", line 1395, in _get_resized_embeddings
    padding_idx=old_embeddings._padding_idx,
  File "/root/paddlejob/workspace/wanghao81/env/PaddleNLP/paddle-env/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1657, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'VocabParallelEmbedding' object has no attribute '_padding_idx'

wanghao19970205 avatar Jan 02 '24 03:01 wanghao19970205

https://github.com/PaddlePaddle/PaddleNLP/blob/develop/llm/finetune_generation.py#L473 具体做法是在这里添加了以下代码: add_tokens = ["xxx", "9510727369170435113", "9818903255735709281", "3812371869983320904"] tokenizer.add_tokens(add_tokens) # model.get_output_embeddings().bias = None # 兼容 model.resize_token_embeddings(len(tokenizer))

wanghao19970205 avatar Jan 02 '24 03:01 wanghao19970205

补充,好像不止是百川2大模型是这样的问题,llama2系列同样是相同的问题,都是Old embeddings are of type <class 'paddle.distributed.fleet.layers.mpu.mp_layers.VocabParallelEmbedding'>, which is not an instance of <class 'paddle.nn.layer.common.Embedding'>,看起来是个通用的case,辛苦看下有没有通用的解决办法

wanghao19970205 avatar Jan 02 '24 03:01 wanghao19970205

抱歉,这个 Tensor Parallel 的模型没有适配 resize_token_embeddings。您可以使用单卡训练

ZHUI avatar Jan 18 '24 10:01 ZHUI

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。

github-actions[bot] avatar Apr 27 '24 00:04 github-actions[bot]