PaddleHub icon indicating copy to clipboard operation
PaddleHub copied to clipboard

NLP 中文本分类替换原文档中使用的模型出现的问题

Open snoopy1316 opened this issue 2 years ago • 6 comments

根据文档进行文本分类,https://github.com/PaddlePaddle/PaddleHub/tree/release/v2.1/demo/text_classification 环境:python3.7;win10;paddlepaddle=2.2.2;paddlehub=2.2.0 在原文档中使用的是name=ernie_tiny模型,想要替换为name=ernie,版本为2.0.0 直接修改为:model = hub.Module(name='ernie', version='2.0.0', task='seq-cls') 结果出现以下报错: [2022-04-02 10:13:23,557] [ WARNING] - An error was encountered while loading ernie. Detailed error information can be found in the C:\Users\admin.paddlehub\log\20220402.log. [2022-04-02 10:13:23,561] [ WARNING] - An error was encountered while loading ernie. Detailed error information can be found in the C:\Users\admin.paddlehub\log\20220402.log. Download https://bj.bcebos.com/paddlehub/paddlehub_dev/ernie_2.0.0.tar.gz [##################################################] 100.00%
Decompress C:\Users\admin.paddlehub\tmp\tmpkq594c3d\ernie_2.0.0.tar.gz [##################################################] 100.00%
[2022-04-02 10:13:24,844] [ INFO] - Successfully uninstalled ernie Traceback (most recent call last):
File "train.py", line 38, in
model = hub.Module(name='ernie', version='2.0.0', task='seq-cls')#选择模型 模型下载地址C:\Users\admin.paddlenlp\models File "D:\Anaconda\install\envs\hub\lib\site-packages\paddlehub\module\module.py", line 395, in new
**kwargs)
ignore_env_mismatch=ignore_env_mismatch) File "D:\Anaconda\install\envs\hub\lib\site-packages\paddlehub\module\manager.py", line 190, in install return self._install_from_name(name, version, ignore_env_mismatch) File "D:\Anaconda\install\envs\hub\lib\site-packages\paddlehub\module\manager.py", line 265, in _install_from_name return self._install_from_url(item['url']) File "D:\Anaconda\install\envs\hub\lib\site-packages\paddlehub\module\manager.py", line 258, in _install_from_url return self._install_from_archive(file) File "D:\Anaconda\install\envs\hub\lib\site-packages\paddlehub\module\manager.py", line 380, in _install_from_archive return self._install_from_directory(directory) File "D:\Anaconda\install\envs\hub\lib\site-packages\paddlehub\module\manager.py", line 364, in _install_from_directory File "", line 983, in _find_and_load File "", line 967, in _find_and_load_unlocked File "", line 677, in _load_unlocked File "", line 728, in exec_module File "", line 219, in _call_with_frames_removed File "C:\Users\admin.paddlehub\modules\ernie\module.py", line 23, in from paddlehub.module.modeling_ernie import ErnieModel, ErnieForSequenceClassification ModuleNotFoundError: No module named 'paddlehub.module.modeling_ernie' 请问该如何修改,感谢!

snoopy1316 avatar Apr 02 '22 02:04 snoopy1316

建议使用最新版的ernie模型:https://www.paddlepaddle.org.cn/hubdetail?name=ernie&en_category=SemanticModel

model = hub.Module(name='ernie', version='2.0.2', task='seq-cls')

KPatr1ck avatar Apr 06 '22 03:04 KPatr1ck

按照这样的要求,使用最新的ernie版本,出现一下问题: PS D:\PaddleHub\PaddleHub\demo\text_classification> python train.py [2022-04-06 14:55:57,042] [ INFO] - Already cached C:\Users\admin.paddlenlp\models\ernie-1.0\ernie_v1_chn_base.pdparams
W0406 14:55:57.045624 18448 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.4, Runtime API Version: 10.2 W0406 14:55:57.052621 18448 device_context.cc:465] device: 0, cuDNN Version: 7.6.
[2022-04-06 14:56:00,553] [ INFO] - Already cached C:\Users\admin.paddlenlp\models\ernie-1.0\vocab.txt [2022-04-06 14:56:07,959] [ INFO] - Already cached C:\Users\admin.paddlenlp\models\ernie-1.0\vocab.txt [2022-04-06 14:56:08,890] [ INFO] - Already cached C:\Users\admin.paddlenlp\models\ernie-1.0\vocab.txt [2022-04-06 14:56:09,814] [ INFO] - PaddleHub model checkpoint loaded. current_epoch=9 [acc=0.9400] Traceback (most recent call last): File "train.py", line 56, in
save_interval=args.save_interval,
File "D:\Anaconda\install\envs\hub\lib\site-packages\paddlehub\finetune\trainer.py", line 213, in train
self.optimizer_step(self.current_epoch, batch_idx, self.optimizer, loss)
File "D:\Anaconda\install\envs\hub\lib\site-packages\paddlehub\finetune\trainer.py", line 395, in optimizer_step self.optimizer.step()
File "D:\Anaconda\install\envs\hub\lib\site-packages\decorator.py", line 232, in fun return caller(func, *(extras + args), **kw) File "D:\Anaconda\install\envs\hub\lib\site-packages\paddle\fluid\dygraph\base.py", line 296, in impl return func(*args, **kwargs) File "D:\Anaconda\install\envs\hub\lib\site-packages\decorator.py", line 232, in fun return caller(func, *(extras + args), **kw) File "D:\Anaconda\install\envs\hub\lib\site-packages\paddle\fluid\wrapped_decorator.py", line 25, in impl return wrapped_func(*args, **kwargs) File "D:\Anaconda\install\envs\hub\lib\site-packages\paddle\fluid\framework.py", line 229, in impl return func(*args, **kwargs) File "D:\Anaconda\install\envs\hub\lib\site-packages\paddle\optimizer\adam.py", line 422, in step loss=None, startup_program=None, params_grads=params_grads) File "D:\Anaconda\install\envs\hub\lib\site-packages\paddle\optimizer\optimizer.py", line 891, in _apply_optimize optimize_ops = self._create_optimization_pass(params_grads) File "D:\Anaconda\install\envs\hub\lib\site-packages\paddle\optimizer\adamw.py", line 372, in _create_optimization_pass AdamW, self)._create_optimization_pass(parameters_and_grads) File "D:\Anaconda\install\envs\hub\lib\site-packages\paddle\optimizer\optimizer.py", line 677, in _create_optimization_pass [p[0] for p in parameters_and_grads if not p[0].stop_gradient]) File "D:\Anaconda\install\envs\hub\lib\site-packages\paddle\optimizer\adam.py", line 297, in _create_accumulators self._add_moments_pows(p) File "D:\Anaconda\install\envs\hub\lib\site-packages\paddle\optimizer\adam.py", line 262, in _add_moments_pows self._add_accumulator(self._moment1_acc_str, p, dtype=acc_dtype) File "D:\Anaconda\install\envs\hub\lib\site-packages\paddle\optimizer\optimizer.py", line 593, in _add_accumulator var.set_value(self._accumulators_holder[var_name]) File "D:\Anaconda\install\envs\hub\lib\site-packages\decorator.py", line 232, in fun return caller(func, *(extras + args), **kw) File "D:\Anaconda\install\envs\hub\lib\site-packages\paddle\fluid\wrapped_decorator.py", line 25, in impl return wrapped_func(*args, **kwargs) File "D:\Anaconda\install\envs\hub\lib\site-packages\paddle\fluid\framework.py", line 229, in impl return func(*args, **kwargs) File "D:\Anaconda\install\envs\hub\lib\site-packages\paddle\fluid\dygraph\varbase_patch_methods.py", line 170, in set_value self.name, self_tensor_np.shape, value_np.shape) AssertionError: Variable Shape not match, Variable [ embedding_0.w_0_moment1_0 ] need tensor with shape (18000, 768) but load set tensor with shape (50006, 1024)

snoopy1316 avatar Apr 06 '22 06:04 snoopy1316

[2022-04-06 14:56:09,814] [ INFO] - PaddleHub model checkpoint loaded. current_epoch=9 [acc=0.9400]

看着报错提示的是参数shape不对应,换了模型后需要重新训练。

KPatr1ck avatar Apr 06 '22 09:04 KPatr1ck

rtionError: Variable Shape not match, Variable [ embedding_0.w_0_moment1_0 ] need tensor with shape (18000, 768) but load set tensor with shape (50006, 1024)

听了你的建议,之前另一个模型下生成有这个checkpoint文件,重新在text_classification文件下新建一个checkpoint2文件来保存,解决问题

snoopy1316 avatar Apr 07 '22 01:04 snoopy1316

[2022-04-06 14:56:09,814] [ INFO] - PaddleHub model checkpoint loaded. current_epoch=9 [acc=0.9400]

在文本匹配demo中,由ernie-tiny更换为bert-base-chinese或者ernie的2.0.2模型,重新训练,出现这样的问题: PS D:\PaddleHub\PaddleHub\demo\text_matching> python train.py [2022-04-07 14:49:49,768] [ INFO] - Already cached C:\Users\admin.paddlenlp\models\bert-base-chinese\bert-base-chinese.pdparams W0407 14:49:49.770637 18872 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.4, Runtime API Version: 10.2 W0407 14:49:49.778635 18872 device_context.cc:465] device: 0, cuDNN Version: 7.6. [2022-04-07 14:49:54,666] [ INFO] - Weights from pretrained model not used in BertModel: ['cls.predictions.decoder_weight', 'cls.predictions.decoder_bias', 'cls.predictions.transform.weight', 'cls.predictions.transform.bias', 'cl s.predictions.layer_norm.weight', 'cls.predictions.layer_norm.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias'] [2022-04-07 14:49:55,127] [ INFO] - Already cached C:\Users\admin.paddlenlp\models\bert-base-chinese\bert-base-chinese-vocab.txt [2022-04-07 14:50:50,766] [ WARNING] - PaddleHub model checkpoint not found, start from scratch... 结束运行 怎样解决

snoopy1316 avatar Apr 08 '22 01:04 snoopy1316