MatchZoo-py icon indicating copy to clipboard operation
MatchZoo-py copied to clipboard

bert processor

Open xuezzz opened this issue 4 years ago • 4 comments

When i use bert processor to tranform my dataset, there will appear a warning:

Token indices sequence length is longer than the specified maximum sequence length for this model (694 > 512). Running this sequence through the model will result in indexing errors.

But my dataset don't have so long sequence! And it will lead to an error when training. Do you know how to solve it? Thanks! Matchzoo version 1.1.1

xuezzz avatar May 18 '20 12:05 xuezzz

BERT has max sequence token length less than 512. I think you have token length greater than 512. Check the token length. Quick solution could be: if you have paragraph of 800, split into two 400 paragraph length and then tokenize.

RoshanGurung93 avatar May 25 '20 16:05 RoshanGurung93

你好,我想问下,当我使用bert训练模型时候,在最后trainer.run()这里,老是报“Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.cuda.IntTensor instead (while checking arguments for embedding)”这个错,一直没找出到底该改动哪里,能否给解决一下啊 非常非常感谢~~

xumingying0612 avatar Aug 27 '20 12:08 xumingying0612

你好,我想问下,当我使用bert训练模型时候,在最后trainer.run()这里,老是报“Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.cuda.IntTensor instead (while checking arguments for embedding)”这个错,一直没找出到底该改动哪里,能否给解决一下啊 非常非常感谢~~

Please provide more details, e.g. code snippets

Chriskuei avatar Aug 28 '20 02:08 Chriskuei

image image 就是在https://github.com/NTMC-Community/MatchZoo-py/blob/master/tutorials/ranking/bert.ipynb中运行到最后一步trainer.run()报错了。求教 感谢感谢!

xumingying0612 avatar Aug 28 '20 08:08 xumingying0612