models
models copied to clipboard
由于自定义数据集较大,引起报错,怎样修改代码?
项目链接:https://github.com/PaddlePaddle/models/tree/4d87afd6480737b64b5974c9c40a5b1c5a4600b3/PaddleNLP/examples/text_classification/rnn
将C:\Users\Administrator.paddlenlp\datasets\chnsenticorp目录下的train.tsv 与dev.tsv和test.tsv替换成了自己的训练集,然后进行训练,发现当训练集总样本个数在3万左右时不会报错,可以进行训练得到模型,但超过了就会发生下面的错误,请问改怎样修改代码呢,摆脱大佬详细些哈 我这边东拼西凑32万个样本的数据集不容易啊!求助,求助! 报错代码如下 `step 30/47 - loss: 0.3494 - acc: 0.9693 - 290ms/step
step 40/47 - loss: 0.3437 - acc: 0.9691 - 301ms/step
Traceback (most recent call last):
File "train.py", line 193, in
File "F:\aanaa\lib\site-packages\paddle\hapi\model.py", line 1503, in fit eval_logs = self._run_one_epoch(eval_loader, cbks, 'eval')
File "F:\aanaa\lib\site-packages\paddle\hapi\model.py", line 1799, in _run_one_epoch data[len(self._inputs):])
File "F:\aanaa\lib\site-packages\paddle\hapi\model.py", line 991, in eval_batch loss = self._adapter.eval_batch(inputs, labels)
File "F:\aanaa\lib\site-packages\paddle\hapi\model.py", line 681, in eval_batch outputs = self.model.network.forward(* [to_variable(x) for x in inputs])
File "F:\aanaa\lib\site-packages\paddlenlp\models\senta.py", line 104, in forward logits = self.model(text, seq_len)
File "F:\aanaa\lib\site-packages\paddle\fluid\dygraph\layers.py", line 884, in call outputs = self.forward(*inputs, **kwargs)
File "F:\aanaa\lib\site-packages\paddlenlp\models\senta.py", line 186, in forward embedded_text = self.embedder(text)
File "F:\aanaa\lib\site-packages\paddle\fluid\dygraph\layers.py", line 884, in call outputs = self.forward(*inputs, **kwargs)
File "F:\aanaa\lib\site-packages\paddle\nn\layer\common.py", line 1289, in forward name=self._name)
File "F:\aanaa\lib\site-packages\paddle\nn\functional\input.py", line 202, in embedding 'remote_prefetch', False, 'padding_idx', padding_idx)
ValueError: (InvalidArgument) Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 857580, but got 858325. Please check input value.
[Hint: Expected ids[i] < row_number, but received ids[i]:858325 >= row_number:857580.] (at D:\2.0.0rc1\paddle\paddle/fluid/operators/lookup_table_v2_op.h:81)
[Hint: If you need C++ stacktraces for debugging, please set FLAGS_call_stack_level=2
.]
[operator < lookup_table_v2 > error] `
初步分析是embeding和输入数据的大小不匹配的问题。具体修改细节,我再沟通下模型负责人吧。
初步分析是embeding和输入数据的大小不匹配的问题。具体修改细节,我再沟通下模型负责人吧。
好的 麻烦了
@yizhipipixia 你好!根据错误提示来看,是输入的word id 超过了 词表大小。请检查输入word id 以及 词汇表大小。
@Steffy-zxf 遇到同样的问题,是什么原因呢?是需要设置哪个参数么?
请问这个问题得到解决了吗?遇到了同样的问题,求解
@Steffy-zxf 遇到同样的问题,是什么原因呢?是需要设置哪个参数么?
遇到同样的问题。请问现在解决了吗? 如何解决?这个应该是分析的文本过长导致的。是不是数据类型的长度限制了?