models icon indicating copy to clipboard operation
models copied to clipboard

由于自定义数据集较大,引起报错,怎样修改代码?

Open yizhipipixia opened this issue 4 years ago • 6 comments

项目链接:https://github.com/PaddlePaddle/models/tree/4d87afd6480737b64b5974c9c40a5b1c5a4600b3/PaddleNLP/examples/text_classification/rnn

将C:\Users\Administrator.paddlenlp\datasets\chnsenticorp目录下的train.tsv 与dev.tsv和test.tsv替换成了自己的训练集,然后进行训练,发现当训练集总样本个数在3万左右时不会报错,可以进行训练得到模型,但超过了就会发生下面的错误,请问改怎样修改代码呢,摆脱大佬详细些哈 我这边东拼西凑32万个样本的数据集不容易啊!求助,求助! 报错代码如下 `step 30/47 - loss: 0.3494 - acc: 0.9693 - 290ms/step

step 40/47 - loss: 0.3437 - acc: 0.9691 - 301ms/step

Traceback (most recent call last):

File "train.py", line 193, in save_dir=args.save_dir)

File "F:\aanaa\lib\site-packages\paddle\hapi\model.py", line 1503, in fit eval_logs = self._run_one_epoch(eval_loader, cbks, 'eval')

File "F:\aanaa\lib\site-packages\paddle\hapi\model.py", line 1799, in _run_one_epoch data[len(self._inputs):])

File "F:\aanaa\lib\site-packages\paddle\hapi\model.py", line 991, in eval_batch loss = self._adapter.eval_batch(inputs, labels)

File "F:\aanaa\lib\site-packages\paddle\hapi\model.py", line 681, in eval_batch outputs = self.model.network.forward(* [to_variable(x) for x in inputs])

File "F:\aanaa\lib\site-packages\paddlenlp\models\senta.py", line 104, in forward logits = self.model(text, seq_len)

File "F:\aanaa\lib\site-packages\paddle\fluid\dygraph\layers.py", line 884, in call outputs = self.forward(*inputs, **kwargs)

File "F:\aanaa\lib\site-packages\paddlenlp\models\senta.py", line 186, in forward embedded_text = self.embedder(text)

File "F:\aanaa\lib\site-packages\paddle\fluid\dygraph\layers.py", line 884, in call outputs = self.forward(*inputs, **kwargs)

File "F:\aanaa\lib\site-packages\paddle\nn\layer\common.py", line 1289, in forward name=self._name)

File "F:\aanaa\lib\site-packages\paddle\nn\functional\input.py", line 202, in embedding 'remote_prefetch', False, 'padding_idx', padding_idx)

ValueError: (InvalidArgument) Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 857580, but got 858325. Please check input value.

[Hint: Expected ids[i] < row_number, but received ids[i]:858325 >= row_number:857580.] (at D:\2.0.0rc1\paddle\paddle/fluid/operators/lookup_table_v2_op.h:81)

[Hint: If you need C++ stacktraces for debugging, please set FLAGS_call_stack_level=2.]

[operator < lookup_table_v2 > error] `

yizhipipixia avatar Dec 30 '20 16:12 yizhipipixia

初步分析是embeding和输入数据的大小不匹配的问题。具体修改细节,我再沟通下模型负责人吧。

GaoWei8 avatar Dec 31 '20 05:12 GaoWei8

初步分析是embeding和输入数据的大小不匹配的问题。具体修改细节,我再沟通下模型负责人吧。

好的 麻烦了

yizhipipixia avatar Dec 31 '20 05:12 yizhipipixia

@yizhipipixia 你好!根据错误提示来看,是输入的word id 超过了 词表大小。请检查输入word id 以及 词汇表大小。

Steffy-zxf avatar Jan 04 '21 02:01 Steffy-zxf

FFE`27WG~R313VQSKRSYFUG

@Steffy-zxf 遇到同样的问题,是什么原因呢?是需要设置哪个参数么?

lightCraft2020 avatar Nov 22 '21 06:11 lightCraft2020

请问这个问题得到解决了吗?遇到了同样的问题,求解

Maydaytyh avatar Jan 22 '22 13:01 Maydaytyh

FFE`27WG~R313VQSKRSYFUG

@Steffy-zxf 遇到同样的问题,是什么原因呢?是需要设置哪个参数么?

遇到同样的问题。请问现在解决了吗? 如何解决?这个应该是分析的文本过长导致的。是不是数据类型的长度限制了?

yanliangyy avatar Jan 31 '22 08:01 yanliangyy