Chinese_Coreference_Resolution issues

官方数据很多UNK

1

博主您好，您给的128的数据很正常，但是我用这个代码（https://github.com/mandarjoshi90/coref）处理v4_gold_conll数据得到的jsonlines文件里面很多UNK。您那边256的数据也有很多UNK嘛

BugMaker-99

关于生僻字问题

1

您好，我尝试了一下您的项目，发现有生僻字没在vocab.txt里面时，会报Keyerror，而尝试使用add_tokens发现没有这个方法，有什么办法可以解决吗

bubblelcc

数据集处理

2

请问该怎么把我自己的txt数据集处理成项目里的这种格式呢？ @troublemaker-r

Davidup1

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

5

跑训练的时候报错

iooops

配置中的model_heads表示什么？

请问大佬对span嵌入的时候，span的额外特征包括use_segment_len、 model_heads，其中model_heads表示什么意思？

cuikai-ai

Found too many repeated mentions (> 10) in the response, so refusing to score

什么时候 "Found too many repeated mentions (> 10) in the response, so refusing to score" 会出现呢？改动fnn-size有用吗？之前为了足够多的现存就把fnn-size调低了。

mtang398

UnicodeEncodeError: 'charmap' codec can't encode characters in position 25-29: character maps to <undefined>

3

不知有没有人遇到过这个问题，在ubuntu上跑的好好的，在一台windows上也跑得好好的，文件open时加的编码是utf-8，在另一台国外电脑上就报这个错 ![image](https://user-images.githubusercontent.com/69768456/228180035-20d0f7f9-a67b-4f3f-8fca-0a86ed507f12.png)

learner-crapy

调整参数和coref官方一样，但是F1指标上不了70，可能是什么问题？

我使用了一块RTX4090分别训练了中文和英文的OntoNote数据，得到下面的结果中文：RoBERTa_zh_L12_PyTorch ![image](https://user-images.githubusercontent.com/69768456/228712981-f4b4357d-ba0f-4101-aaa3-74e44438d219.png) 英文：spanbert_base ![image](https://user-images.githubusercontent.com/69768456/228712779-6616aaf4-29b8-46ff-99c6-a1cd9f257f1a.png) 使用参数如下： `# Computation limits. max_top_antecedents = 50 max_training_sentences = 11 top_span_ratio = 0.4 max_num_speakers = 20 max_segment_len = 128 # Learning bert_learning_rate = 1e-05 task_learning_rate...

learner-crapy

Permission denied: 'conll-2012/scorer/v8.01/scorer.pl

看到不少人问这个问题，在下亲自实践，添加个issue吧 1. 打开这个文件目录，做如下设置 ![image](https://user-images.githubusercontent.com/69768456/227858961-7a118e75-95a3-48b2-a936-bed9a9b3739c.png) 2. 我还执行了`sudo chmod 777 ./scorer.pl`不知是否是必要的，但是两步设置完跑通了 ![image](https://user-images.githubusercontent.com/69768456/227859367-4908f780-9a02-4400-ab4c-21d72e67d367.png) 附：windows下没试过，想来更改读写权限，是一样的道理

learner-crapy

Chinese_Coreference_Resolution
Chinese_Coreference_Resolution copied to clipboard

Metadata

官方数据很多UNK

关于生僻字问题

在训练时，验证的时候报错，我不知道基本流程是什么样的

数据集处理

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

配置中的model_heads表示什么？

Found too many repeated mentions (> 10) in the response, so refusing to score

UnicodeEncodeError: 'charmap' codec can't encode characters in position 25-29: character maps to <undefined>

调整参数和coref官方一样，但是F1指标上不了70，可能是什么问题？

Permission denied: 'conll-2012/scorer/v8.01/scorer.pl

← Metadata

Owner

Metadata

Chinese_Coreference_Resolution Chinese_Coreference_Resolution copied to clipboard

Metadata

← Metadata

Owner

Metadata

Chinese_Coreference_Resolution
Chinese_Coreference_Resolution copied to clipboard