gerbil
gerbil copied to clipboard
ACE05 dataset
Hi. Is there a plan to include the ACE05 dataset that has annotations for both coreference resolution and entity linking? It is quite a big dataset. Details: http://www.aclweb.org/anthology/W10-3503
We (@tortugaAttack ;) ) will check the dataset and include it if eligble.
Dataset info page: https://catalog.ldc.upenn.edu/LDC2006T06 From my point of view we would need a "LDC User Agreement for Non-Members"
Example: https://catalog.ldc.upenn.edu/desc/addenda/LDC2006T06.txt
Signing the Licence as University Leipzig shouldn't be a problem since we already use other ldc datasets.
Okay. However, Let's make sure that we can use the dataset. It contains a lot of markings, e.g., relations, dates and pronouns. Just take a look at the example and you see what I am talking about. We should check whether the available named entities that are marked in the text can be used. Otherwise, it is a lot of effort to increase the dataset quality.
Oh I see, there are no links to wikipedia, or do I miss something @octavian-ganea ?
@RicardoUsbeck I have the Wiki annotations which are separated from the original dataset, but it's best if you can ask for them directly to the authors of this paper: http://www.aclweb.org/anthology/W10-3503 . This paper uses this dataset as well: http://nlp.cs.berkeley.edu/pubs/Durrett-Klein_2014_Joint_paper.pdf
So do we have the dataset? :)
(we need next to annotations the ace05 dataset itself)
Also for future me: Guidline to the entities: https://www.ldc.upenn.edu/collaborations/past-projects/ace/annotation-tasks-and-specifications
Could you please follow https://hlt-nlp.fbk.eu/technologies/acetowiki
Thanks! now i have the annotations. As soon as they send me the corpus i will finish the Dataset ;)
So, did you sign the license?
not yet, i could though. I have mistaken it with the NEEL challenge ;)
Still do not have access to the corpus, as my request to LDC is paused as long as i am not a verified member of the University Leipzig.
Who needs to verify you?
I think Gerhard Heyer. At least the folks of ldc stated it in the respond. I will ask him to verify me