e2e-coref icon indicating copy to clipboard operation
e2e-coref copied to clipboard

What does the cluster in the json file mean?

Open MENGHAH opened this issue 5 years ago • 3 comments

I have got the jsonlines files through the setup_training.sh. But I can't understand the meaning of clusters in the json files. Can you explain it to me?

MENGHAH avatar May 17 '19 08:05 MENGHAH

I have the same question

rainyrainyguo avatar Jul 30 '19 08:07 rainyrainyguo

Hello,

I was having the same question. It took some time to understand that.

Its the the coreference cluster present in the actual gold files. The key "clusters" in jsonlines file contains list of clusters present in the original file and each cluster contains list of mentions. The mentions are represented by its start and end index from the original file.

For example, in test.jsonlines - first entry is for file "bc/cctv/00/cctv_0005_0". Clusters are - [[[57, 59], [25, 27], [42, 44]], [[19, 23], [16, 16]], [[83, 83], [82, 82]]].

Here, mentions [19, 23], [16, 16] are in same cluster. Also there are other two clusters as [57, 59], [25, 27], [42, 44] and [83, 83], [82, 82]. Mention -

'the', 'Chinese', 'securities', 'regulatory', 'department'

is represented as its start and end index [19,23] . And so on for other mentions.

I hope this helps for other people as well.

thanks, Onkar

oapandit avatar Sep 18 '19 08:09 oapandit

@MENGHAH Could you please send this document for me?train.jsonlines,test.jsonlines,dev.jsonlines

liyaoshigehaoren avatar Jun 21 '20 01:06 liyaoshigehaoren