Tensorflow-Tutorial Received a label value of -2147483648 which is outside the valid range of [0, 5).

Bi-directional lstm中文分词里，报错tensorflow.python.framework.errors_impl.InvalidArgumentError: Received a label value of -2147483648 which is outside the valid range of [0, 5). Label values: -2147483648 -2147483648 2 3 -2147483648 0 0 0 0 0 0 0 0 0 0 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 2 3 -2147483648 0 0 0 0 0 0 0 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 2 3 -2147483648 0 0 0 0 0 0 0 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 2 3 -2147483648 0 0 0 0 0 -2147483648 -2147483648 -2147483648 -2147483648 ...等等等].我用的是自己的数据集，处理的跟样例数据集一样的形式（今/B 天/M是/M个/M好/E3天/E2气/E），结果报这个错，请问是否是我的数据集中的句子长度过长？该如何解决？

Jul 28 '18 13:07 parkourcx

@parkourcx 你好，感谢提问。你的这个问题应该不是句子长度的问题，而是数据处理中每个字的label标注的不对。我记得标注中只用了 s b m e 四种标注分别表示： s- 单字成词, b- 词首, m-词中,e-词尾；对于 padding 部分统一使用 x 作为标注。从你的报错来看你的 label 有些 -2147483648 应该是不对的，还有我也不太明白（今/B 天/M是/M个/M好/E3天/E2气/E） 为什么这样标注。

Aug 04 '18 06:08 yongyehuang

非常感谢回复！是这样的，我在做一个古汉语断句的程序，这样写是为了标注到每个古汉语的开头中间和结尾，即/S单字成句，/B句子的开始，/M句子的中间，/E句子的结尾，我觉得断句和分词其实都是序列切割问题，所以您的程序经过调整应该可以实现古汉语断句，我这样理解对吗？后来我重新调整了语料，发现确实是我的label有问题，现在程序已经没问题了，正在训练模型。另我还想请教一个问题，如果我用6元标注集的话（S B M E3 E2 E，分别代表单字成句，开始，中间，句子倒数第三个字，倒数第二个字，结尾），除了预处理语料部分要做相应的改变以外，模型部分需要做什么更改吗？期待您的回复，祝好！

yongyehuang [email protected]于2018年8月4日周六14:34写道：

@parkourcx https://github.com/parkourcx 你好，感谢提问。你的这个问题应该不是句子长度的问题，而是数据处理中每个字的label标注的不对。我记得标注中只用了 s b m e 四种标注分别表示：s- 单字成词, b- 词首, m-词中,e-词尾；对于 padding 部分统一使用 x 作为标注。从你的报错来看你的 label 有些 -2147483648应该是不对的，还有我也不太明白（今/B 天/M是/M个/M好/E3天/E2气/E）为什么这样标注。

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/yongyehuang/Tensorflow-Tutorial/issues/15#issuecomment-410427964, or mute the thread https://github.com/notifications/unsubscribe-auth/ALEUoDAFiA-DDC1oMrg7yMJTYcJPWu3Uks5uNUBsgaJpZM4VlDKJ .

-- Sent from my iPhone

Aug 04 '18 06:08 parkourcx

@parkourcx 这样的话应该没有什么问题，你可以比较一下这样的标注和只使用 s b m e 四tag标注的方式看看那个效果好。模型的话这个模型也是比较简单的模型，你也可以尝试一下 lstm+crf 的模型（我自己也没跑过。。。），序列标注中用得还是比较多的。

Aug 04 '18 06:08 yongyehuang

好的，我会尝试一下，还有刚才您说的“对于 padding 部分统一使用 x 作为标注 ”我不是很明白，我把源程序里的tags=[‘s’,‘b’,‘m’,‘e’,‘x’]改成了tags=[‘S’,‘B’,‘M’,‘E’]，会有什么影响吗？

yongyehuang [email protected]于2018年8月4日周六14:50写道：

@parkourcx https://github.com/parkourcx 这样的话应该没有什么问题，你可以比较一下这样的标注和只使用 s b m e 四tag标注的方式看看那个效果好。模型的话这个模型也是比较简单的模型，你也可以尝试一下lstm+crf 的模型（我自己也没跑过。。。），序列标注中用得还是比较多的。

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/yongyehuang/Tensorflow-Tutorial/issues/15#issuecomment-410428720, or mute the thread https://github.com/notifications/unsubscribe-auth/ALEUoLrJ0gur4d6OwxYSQ9mNQl6ananTks5uNUQngaJpZM4VlDKJ .

-- Sent from my iPhone

Aug 04 '18 06:08 parkourcx

@parkourcx padding 是为了把每个样本变成一样的长度，对于长度不足的部分序列要使用一个特殊符号进行补充，这个特殊符号都标注为一个新的label，所以你还是使用 tags=[‘s’,‘b’,‘m’,‘e’,‘x’] 吧。

Aug 04 '18 06:08 yongyehuang

我现在没有x这个tag，结果是不是就完全不对了？而且class_num=5就是因为有5个标签的缘故吧？

yongyehuang [email protected]于2018年8月4日周六14:59写道：

@parkourcx https://github.com/parkourcx padding 是为了把每个样本变成一样的长度，对于长度不足的部分序列要使用一个特殊符号进行补充，这个特殊符号都标注为一个新的label，所以你还是使用 tags=[‘s’,‘b’,‘m’,‘e’,‘x’] 吧。

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/yongyehuang/Tensorflow-Tutorial/issues/15#issuecomment-410429199, or mute the thread https://github.com/notifications/unsubscribe-auth/ALEUoEmT-LSr3AECYhOKhFqkQfM6fSk9ks5uNUZcgaJpZM4VlDKJ .

-- Sent from my iPhone

Aug 04 '18 07:08 parkourcx

@parkourcx 'x' 是在代码处理中加上的tag，不是标注数据中的 tag

Aug 04 '18 07:08 yongyehuang

如果tags这个list里没有x，会有什么影响呢？那么class_num就应该是4而不是5了？现在情况是我预处理的时候只有tags里只写了SBME这四个，我需要加上x再重新处理一遍语料吗？

yongyehuang [email protected]于2018年8月4日周六15:11写道：

@parkourcx https://github.com/parkourcx 'x' 是在代码处理中加上的tag，不是标注数据中的 tag

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/yongyehuang/Tensorflow-Tutorial/issues/15#issuecomment-410429821, or mute the thread https://github.com/notifications/unsubscribe-auth/ALEUoNCQLl8h38U7RXui1P616zp8rr7cks5uNUkngaJpZM4VlDKJ .

-- Sent from my iPhone

Aug 04 '18 07:08 parkourcx

打扰一下，我想请教一下在求转移状态矩阵之前所设的A = {

'SB': 0,

'SS':0,

'ES':0,

'BE': 0,

'BM': 0,

'ME': 0,

'MM': 0,

'EB':0

} ，这里的SS SB ES等指的是什么意思，我不是很理解

Chen xiang [email protected]于2018年8月4日周六15:17写道：

如果tags这个list里没有x，会有什么影响呢？那么class_num就应该是4而不是5了？现在情况是我预处理的时候只有tags里只写了SBME这四个，我需要加上x再重新处理一遍语料吗？

yongyehuang [email protected]于2018年8月4日周六15:11写道：

@parkourcx https://github.com/parkourcx 'x' 是在代码处理中加上的tag，不是标注数据中的 tag

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/yongyehuang/Tensorflow-Tutorial/issues/15#issuecomment-410429821, or mute the thread https://github.com/notifications/unsubscribe-auth/ALEUoNCQLl8h38U7RXui1P616zp8rr7cks5uNUkngaJpZM4VlDKJ .

-- Sent from my iPhone

-- Sent from my iPhone

Aug 10 '18 12:08 parkourcx

是这样求转移概率矩阵吗？

Chen xiang [email protected]于2018年8月10日周五20:14写道：

打扰一下，我想请教一下在求转移状态矩阵之前所设的A = {
'SB': 0,

'SS':0,

'ES':0,

'BE': 0,

'BM': 0,

'ME': 0,

'MM': 0,

'EB':0
} ，这里的SS SB ES等指的是什么意思，我不是很理解

Chen xiang [email protected]于2018年8月4日周六15:17写道：

如果tags这个list里没有x，会有什么影响呢？那么class_num就应该是4而不是5了？现在情况是我预处理的时候只有tags里只写了SBME这四个，我需要加上x再重新处理一遍语料吗？

yongyehuang [email protected]于2018年8月4日周六15:11写道：

@parkourcx https://github.com/parkourcx 'x' 是在代码处理中加上的tag，不是标注数据中的 tag

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/yongyehuang/Tensorflow-Tutorial/issues/15#issuecomment-410429821, or mute the thread https://github.com/notifications/unsubscribe-auth/ALEUoNCQLl8h38U7RXui1P616zp8rr7cks5uNUkngaJpZM4VlDKJ .

-- Sent from my iPhone

-- Sent from my iPhone

-- Sent from my iPhone

Aug 10 '18 12:08 parkourcx

Tensorflow-Tutorial Tensorflow-Tutorial copied to clipboard

Received a label value of -2147483648 which is outside the valid range of [0, 5).

Tensorflow-Tutorial
Tensorflow-Tutorial copied to clipboard