gerbil icon indicating copy to clipboard operation
gerbil copied to clipboard

ACE05 dataset

Open octavian-ganea opened this issue 8 years ago • 15 comments

Hi. Is there a plan to include the ACE05 dataset that has annotations for both coreference resolution and entity linking? It is quite a big dataset. Details: http://www.aclweb.org/anthology/W10-3503

octavian-ganea avatar Aug 30 '16 17:08 octavian-ganea

We (@tortugaAttack ;) ) will check the dataset and include it if eligble.

RicardoUsbeck avatar Aug 31 '16 07:08 RicardoUsbeck

Dataset info page: https://catalog.ldc.upenn.edu/LDC2006T06 From my point of view we would need a "LDC User Agreement for Non-Members"

Example: https://catalog.ldc.upenn.edu/desc/addenda/LDC2006T06.txt

MichaelRoeder avatar Aug 31 '16 08:08 MichaelRoeder

Signing the Licence as University Leipzig shouldn't be a problem since we already use other ldc datasets.

RicardoUsbeck avatar Aug 31 '16 08:08 RicardoUsbeck

Okay. However, Let's make sure that we can use the dataset. It contains a lot of markings, e.g., relations, dates and pronouns. Just take a look at the example and you see what I am talking about. We should check whether the available named entities that are marked in the text can be used. Otherwise, it is a lot of effort to increase the dataset quality.

MichaelRoeder avatar Aug 31 '16 08:08 MichaelRoeder

Oh I see, there are no links to wikipedia, or do I miss something @octavian-ganea ?

RicardoUsbeck avatar Aug 31 '16 09:08 RicardoUsbeck

@RicardoUsbeck I have the Wiki annotations which are separated from the original dataset, but it's best if you can ask for them directly to the authors of this paper: http://www.aclweb.org/anthology/W10-3503 . This paper uses this dataset as well: http://nlp.cs.berkeley.edu/pubs/Durrett-Klein_2014_Joint_paper.pdf

octavian-ganea avatar Aug 31 '16 09:08 octavian-ganea

So do we have the dataset? :)

(we need next to annotations the ace05 dataset itself)

TortugaAttack avatar Nov 23 '16 08:11 TortugaAttack

Also for future me: Guidline to the entities: https://www.ldc.upenn.edu/collaborations/past-projects/ace/annotation-tasks-and-specifications

TortugaAttack avatar Nov 23 '16 16:11 TortugaAttack

Could you please follow https://hlt-nlp.fbk.eu/technologies/acetowiki

RicardoUsbeck avatar Nov 23 '16 17:11 RicardoUsbeck

Thanks! now i have the annotations. As soon as they send me the corpus i will finish the Dataset ;)

TortugaAttack avatar Nov 23 '16 19:11 TortugaAttack

So, did you sign the license?

RicardoUsbeck avatar Nov 24 '16 06:11 RicardoUsbeck

not yet, i could though. I have mistaken it with the NEEL challenge ;)

TortugaAttack avatar Nov 24 '16 08:11 TortugaAttack

Still do not have access to the corpus, as my request to LDC is paused as long as i am not a verified member of the University Leipzig.

TortugaAttack avatar Feb 01 '17 16:02 TortugaAttack

Who needs to verify you?

RicardoUsbeck avatar Feb 01 '17 17:02 RicardoUsbeck

I think Gerhard Heyer. At least the folks of ldc stated it in the respond. I will ask him to verify me

TortugaAttack avatar Feb 02 '17 00:02 TortugaAttack