ChineseNER icon indicating copy to clipboard operation
ChineseNER copied to clipboard

减小数据集后,报错:ValueError: setting an array element with a sequence.

Open SanSLee opened this issue 6 years ago • 11 comments

Traceback (most recent call last):

File "", line 1, in runfile('E:/【重点代码】ChineseNER-master-bishe/Gradu_Prj/main.py', wdir='E:/【重点代码】ChineseNER-master-bishe/Gradu_Prj')

File "E:\anaconda INSTALL\envs\tensorflow\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile execfile(filename, namespace)

File "E:\anaconda INSTALL\envs\tensorflow\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile exec(compile(f.read(), filename, 'exec'), namespace)

File "E:/【重点代码】ChineseNER-master-bishe/Gradu_Prj/main.py", line 246, in train()

File "E:/【重点代码】ChineseNER-master-bishe/Gradu_Prj/main.py", line 192, in train step, batch_loss = model.run_step(sess, True, batch)

File "E:\【重点代码】ChineseNER-master-bishe\Gradu_Prj\model.py", line 221, in run_step feed_dict)

File "E:\anaconda INSTALL\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 895, in run run_metadata_ptr)

File "E:\anaconda INSTALL\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1097, in _run np_val = np.asarray(subfeed_val, dtype=subfeed_dtype)

File "E:\anaconda INSTALL\envs\tensorflow\lib\site-packages\numpy\core\numeric.py", line 492, in asarray return array(a, dtype, copy=False, order=order)

ValueError: setting an array element with a sequence.

将example.train, example.test, example.dev三个文件中的句子删除一部分后,转变成txt文档保存,但运行时出错。

SanSLee avatar Mar 22 '18 14:03 SanSLee

您好,我在运行的时候也报了这个错误,不过我没有改动数据文件。请问您这个问题解决了吗?

bearchj avatar Jun 22 '18 06:06 bearchj

请问 您这个问题解决了吗?

amoursmile avatar Nov 09 '18 09:11 amoursmile

我也是用了比较少的数据集,请问您问题解决了吗?

Jenny181212 avatar Dec 16 '18 08:12 Jenny181212

想请教一下,这个错误,能否解决一下

agilelab avatar Apr 03 '19 06:04 agilelab

想请教一下,这个错误,能否解决一下

您好,我在运行的时候也报了这个错误,不过我没有改动数据文件。请问您这个问题解决了吗?

我也是用了比较少的数据集,请问您问题解决了吗?

我也是用了比较少的数据集,请问您问题解决了吗?

请位三位,这个问题是如何解决的

agilelab avatar Apr 03 '19 06:04 agilelab

主要是因为数据集标注格式错了,windows下的换行是\r\n把它换成\n就行,还有中间的空格写入\t。

wakanow avatar Apr 15 '19 00:04 wakanow

可以说得详细一点吗?新手小白 @wakanow

mz2sj avatar Apr 25 '19 07:04 mz2sj

@agilelab 请问你解决了吗

mz2sj avatar Apr 25 '19 07:04 mz2sj

此问题经过仔细跟踪检查,发现是在loader.py代码之中的prepare_dataset函数之中产生,不确定是什么原因导致输出的四个元组长度不一致,貌似原因是jieba分词的时候,小概率把比如10个字,经分词,分词总长度超过了10,猜测10个字符之中带了一个特殊字符,但是没有找到,所以我加了判断代码,修改后如下:

def prepare_dataset(sentences, char_to_id, tag_to_id, lower=False, train=True): """ Prepare the dataset. Return a list of lists of dictionaries containing: - word indexes - word char indexes - tag indexes 返回的data=[[句,句中字在训练字映射字典中的id,句中分词位置list,句中字在训练数据标注映射字典中的id],......] """

none_index = tag_to_id["O"]

def f(x):
    return x.lower() if lower else x
data = []
for s in sentences:
    string = [w[0] for w in s]
    chars = [char_to_id[f(w) if f(w) in char_to_id else '<UNK>']
             for w in string]
    segs = get_seg_features("".join(string))
    if train:
        tags = [tag_to_id[w[-1]] for w in s]
    else:
        tags = [none_index for _ in chars]

    # 返回的四个列表如果不能对齐,即如果列表长度不一到,抛弃掉 JAMES 2019-04-03
    if len(string) == len(chars) == len(segs) == len(tags):
        data.append([string, chars, segs, tags])
    else:
        st = "".join(string)
        print("句子:[{0}]标注数据错误".format(st))

return data

agilelab avatar Apr 30 '19 02:04 agilelab

此问题经过仔细跟踪检查,发现是在loader.py代码之中的prepare_dataset函数之中产生,不确定是什么原因导致输出的四个元组长度不一致,貌似原因是jieba分词的时候,小概率把比如10个字,经分词,分词总长度超过了10,猜测10个字符之中带了一个特殊字符,但是没有找到,所以我加了判断代码,修改后如下:

def prepare_dataset(sentences, char_to_id, tag_to_id, lower=False, train=True): """ Prepare the dataset. Return a list of lists of dictionaries containing:

  • word indexes
  • word char indexes
  • tag indexes 返回的data=[[句,句中字在训练字映射字典中的id,句中分词位置list,句中字在训练数据标注映射字典中的id],......] """ none_index = tag_to_id["O"] def f(x): return x.lower() if lower else x data = [] for s in sentences: string = [w[0] for w in s] chars = [char_to_id[f(w) if f(w) in char_to_id else '<UNK>'] for w in string] segs = get_seg_features("".join(string)) if train: tags = [tag_to_id[w[-1]] for w in s] else: tags = [none_index for _ in chars] # 返回的四个列表如果不能对齐,即如果列表长度不一到,抛弃掉 JAMES 2019-04-03 if len(string) == len(chars) == len(segs) == len(tags): data.append([string, chars, segs, tags]) else: st = "".join(string) print("句子:[{0}]标注数据错误".format(st)) return data

天行健,君子当自强不息

------------------ 原始邮件 ------------------ 发件人: "mzsj"[email protected]; 发送时间: 2019年4月25日(星期四) 下午3:56 收件人: "zjy-ucas/ChineseNER"[email protected]; 抄送: "47920381"[email protected]; "Mention"[email protected]; 主题: Re: [zjy-ucas/ChineseNER] 减小数据集后,报错:ValueError: setting an array element with a sequence. (#30)

@agilelab 请问你解决了吗

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

agilelab avatar Apr 30 '19 02:04 agilelab

谢谢 @agilelab

mz2sj avatar May 02 '19 07:05 mz2sj