EasyNLP icon indicating copy to clipboard operation
EasyNLP copied to clipboard

Spanproto recall为1.0的情况

Open JayShJi opened this issue 3 years ago • 4 comments

您好!

在运行您提供的脚本后,发现recall一直为1.0的情况,不知是否是下面这行代码的原因(这里将真实标签也拼进去,后面评测的时候泄露了)https://github.com/wjn1996/SpanProto/blob/f1e0acb8672f0bfcbb7c827c48b06b3e8ccb295a/models/span_proto.py#L588

此外,在将此处改成query_all_spans = query_predict_spans后,得到的结果和论文中相差较大,不知哪边出了问题。 FEW-NERD 5way-1shot: inter—— span_f1:0.5826 class_f1:0.4618 intra——span_f1:0.4606 class_f1:0.3548

JayShJi avatar Nov 08 '22 13:11 JayShJi

您好!

在运行您提供的脚本后,发现recall一直为1.0的情况,不知是否是下面这行代码的原因(这里将真实标签也拼进去,后面评测的时候泄露了)https://github.com/wjn1996/SpanProto/blob/f1e0acb8672f0bfcbb7c827c48b06b3e8ccb295a/models/span_proto.py#L588

此外,在将此处改成query_all_spans = query_predict_spans后,得到的结果和论文中相差较大,不知哪边出了问题。 FEW-NERD 5way-1shot: inter—— span_f1:0.5826 class_f1:0.4618 intra——span_f1:0.4606 class_f1:0.3548

大佬你好~ 请问你是怎么复现的,我怎么没法运行呢?源码是否有问题呀?

Shajiu avatar Dec 07 '22 06:12 Shajiu

您是怎么实现的呢? ou should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Fail to resize token embeddings. Running tokenizer on dataset: 0%| | 0/20 [00:00<?, ?ba/s] Traceback (most recent call last): File "/code/SpanProto/nlp_trainer.py", line 285, in main() File "/code/SpanProto/nlp_trainer.py", line 135, in main tokenized_datasets = processor.get_tokenized_datasets() File "/code/SpanProto/processor/ProcessorBase.py", line 308, in get_tokenized_datasets raw_datasets = raw_datasets.map( File "/home/shajiu/anaconda3/lib/python3.9/site-packages/datasets/dataset_dict.py", line 494, in map { File "/home/shajiu/anaconda3/lib/python3.9/site-packages/datasets/dataset_dict.py", line 495, in k: dataset.map( File "/home/shajiu/anaconda3/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2092, in map return self._map_single( File "/home/shajiu/anaconda3/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 518, in wrapper out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs) File "/home/shajiu/anaconda3/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 485, in wrapper out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs) File "/home/shajiu/anaconda3/lib/python3.9/site-packages/datasets/fingerprint.py", line 411, in wrapper out = func(self, *args, **kwargs) File "/home/shajiu/anaconda3/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2486, in _map_single writer.write_batch(batch) File "/home/shajiu/anaconda3/lib/python3.9/site-packages/datasets/arrow_writer.py", line 458, in write_batch pa_table = pa.Table.from_pydict(typed_sequence_examples) File "pyarrow/table.pxi", line 1868, in pyarrow.lib.Table.from_pydict File "pyarrow/table.pxi", line 2658, in pyarrow.lib._from_pydict File "pyarrow/array.pxi", line 342, in pyarrow.lib.asarray File "pyarrow/array.pxi", line 230, in pyarrow.lib.array File "pyarrow/array.pxi", line 110, in pyarrow.lib._handle_arrow_array_protocol File "/home/shajiu/anaconda3/lib/python3.9/site-packages/datasets/arrow_writer.py", line 140, in arrow_array out = pa.array(cast_to_python_objects(self.data, only_1d_for_numpy=True), type=type) File "pyarrow/array.pxi", line 316, in pyarrow.lib.array File "pyarrow/array.pxi", line 39, in pyarrow.lib._sequence_to_array File "pyarrow/error.pxi", line 143, in pyarrow.lib.pyarrow_internal_check_status File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: Could not convert {'input_ids': [[101, 23236, 24853, 2578, 2038, 11241, 1999, 10594, 5480, 2104, 1996, 2231, 1997, 2634, 17531, 2080, 4712, 5679, 1006, 21307, 4523, 1007, 1012, 102], [101, 2087, 1997, 9666, 2884, 1005, 1055, 3934, 2031, 2042, 1999, 5726, 28649, 2412, 2144, 1012, 102], [101, 1999, 2804, 2027, 8678, 2006, 1036, 1036, 1996, 4918, 2829, 1998, 29044, 2100, 2265, 1036, 1036, 1010, 2029, 2743, 2006, 5095, 16956, 2076, 1996, 3865, 1012, 102], [101, 1036, 1036, 8952, 2866, 1036, 1036, 3964, 2008, 1996, 4234, 2792, 1006, 1036, 1036, 18712, 2891, 15851, 2051, 24901, 2015, 999, 102], [101, 2016, 2363, 2014, 5065, 1997, 2671, 3014, 1999, 9440, 1998, 14266, 2013, 2624, 5277, 2110, 2118, 1010, 1998, 2038, 3687, 13099, 10618, 2015, 2004, 2019, 5057, 2966, 16661, 1010, 10516, 9450, 1010, 1998, 18440, 13592, 2015, 9450, 1012, 102]], 'token_type_ids': [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], 'attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], 'offset_mapping': [[(0, 0), (0, 4), (5, 16), (17, 25), (26, 29), (30, 38), (39, 41), (42, 46), (46, 48), (49, 54), (55, 58), (59, 69), (70, 72), (73, 78), (79, 81), (81, 82), (83, 92), (93, 99), (100, 101), (102, 104), (104, 106), (107, 108), (109, 110), (0, 0)], [(0, 0), (0, 4), (5, 7), (8, 11), (11, 13), (14, 15), (15, 16), (17, 25), (26, 30), (31, 35), (36, 38), (39, 43), (43, 49), (50, 54), (55, 60), (61, 62), (0, 0)], [(0, 0), (0, 2), (3, 11), (12, 16), (17, 29), (30, 32), (33, 34), (34, 35), (36, 39), (40, 47), (48, 53), (54, 57), (58, 63), (63, 64), (65, 69), (70, 71), (71, 72), (73, 74), (75, 80), (81, 84), (85, 87), (88, 96), (97, 105), (106, 112), (113, 116), (117, 122), (123, 124), (0, 0)], [(0, 0), (0, 1), (1, 2), (3, 8), (9, 11), (12, 13), (13, 14), (15, 20), (21, 25), (26, 29), (30, 39), (40, 47), (48, 49), (50, 51), (51, 52), (53, 55), (55, 57), (57, 60), (61, 65), (65, 69), (69, 70), (71, 72), (0, 0)], [(0, 0), (0, 3), (4, 12), (13, 16), (17, 25), (26, 28), (29, 36), (37, 43), (44, 46), (47, 52), (53, 56), (57, 66), (67, 71), (72, 75), (76, 81), (82, 87), (88, 98), (99, 100), (101, 104), (105, 108), (109, 115), (116, 126), (127, 140), (140, 141), (142, 144), (145, 147), (148, 157), (158, 165), (166, 176), (177, 178), (179, 186), (187, 197), (198, 199), (200, 203), (204, 208), (208, 211), (211, 212), (213, 223), (224, 225), (0, 0)]]} with type BatchEncoding: did not recognize Python value type when inferring an Arrow data type

Shajiu avatar Dec 11 '22 13:12 Shajiu

您好!

在运行您提供的脚本后,发现recall一直为1.0的情况,不知是否是下面这行代码的原因(这里将真实标签也拼进去,后面评测的时候泄露了)https://github.com/wjn1996/SpanProto/blob/f1e0acb8672f0bfcbb7c827c48b06b3e8ccb295a/models/span_proto.py#L588

此外,在将此处改成query_all_spans = query_predict_spans后,得到的结果和论文中相差较大,不知哪边出了问题。 FEW-NERD 5way-1shot: inter—— span_f1:0.5826 class_f1:0.4618 intra——span_f1:0.4606 class_f1:0.3548

I found the same problem, hoping the author can answer it.

liyongqi2002 avatar Dec 14 '22 11:12 liyongqi2002

您好!

在运行您提供的脚本后,发现recall一直为1.0的情况,不知是否是下面这行代码的原因(这里将真实标签也拼进去,后面评测的时候泄露了)https://github.com/wjn1996/SpanProto/blob/f1e0acb8672f0bfcbb7c827c48b06b3e8ccb295a/models/span_proto.py#L588

此外,在将此处改成query_all_spans = query_predict_spans后,得到的结果和论文中相差较大,不知哪边出了问题。 FEW-NERD 5way-1shot: inter—— span_f1:0.5826 class_f1:0.4618 intra——span_f1:0.4606 class_f1:0.3548

您好!

在运行您提供的脚本后,发现recall一直为1.0的情况,不知是否是下面这行代码的原因(这里将真实标签也拼进去,后面评测的时候泄露了)https://github.com/wjn1996/SpanProto/blob/f1e0acb8672f0bfcbb7c827c48b06b3e8ccb295a/models/span_proto.py#L588

此外,在将此处改成query_all_spans = query_predict_spans后,得到的结果和论文中相差较大,不知哪边出了问题。 FEW-NERD 5way-1shot: inter—— span_f1:0.5826 class_f1:0.4618 intra——span_f1:0.4606 class_f1:0.3548

应该是分词导致的公共序列问题,导致召回率为一

swaggy66 avatar Nov 19 '23 06:11 swaggy66