grobid-ner icon indicating copy to clipboard operation
grobid-ner copied to clipboard

Question about tagging health facilities, correctional institutions, factories, etc.

Open alexeyev opened this issue 1 year ago • 3 comments

Dear colleague,

thank you for your great work on the very thoroughly written NER annotation guidelines and the grobid project in general.

We are considering to employ your instructions as a basis for a similar task. However, upon my trial annotation sessions, I got confused when trying to distinguish between ORGANISATION/BUSINESS/INSTITUTION/INSTALLATION in a few cases and did not manage to find any relevant evidence in the datasets in the repository to make a decision.

May I ask you to share your vision?

(1) Let's say the sentence is about a certain state hospital restoration works. Which tag should I use for the hospital name, INSTITUTION? What would you suggest to do with the mentions of schools (probably INSTITUTION by analogy with the universities?), prisons, theatres, cinemas?

(2) What would you suggest in this case:

A truck completely burned down on the <tag>Bishkek-Naryn-Torugart bypass road</tag>.

I suppose things like roads, while being anthropogenic structures, should nevertheless be tagged as LOCATION?

(3) The news texts I am working with have a few mentions of both state-owned and private factories; should both of these types of plants be annotated with BUSINESS, in your opinion?

Thank you in advance --- and best regards.

alexeyev avatar May 21 '23 07:05 alexeyev

Dear @alexeyev, thanks for your interest in grobid-ner.

Have you seen the documentation related to the different classes / senses?

  • general summary: https://grobid-ner.readthedocs.io/en/latest/class-and-senses/
  • in-dept discussion about possible overlapping: https://grobid-ner.readthedocs.io/en/latest/class-and-senses/#classes-specific-guidelines (there is a specific discussion on the difference between ORGANISATION and INSTITUTION.

To answer your questions:

  1. I think the hospital should be considered an INSTITUTION, as well as schools, prisons. I'm not sure about theatres and cinemas, they could be BUSINESS
  2. indeed, the road name is a LOCATION in my view (see: https://grobid-ner.readthedocs.io/en/latest/class-and-senses/#location)
  3. I suppose so, if you could share the specific examples, I can be more precise, as sometime the context might be misleading

lfoppiano avatar May 31 '23 08:05 lfoppiano

Dear @lfoppiano, thank you for the response! We've added your pointers to our instruction.

Thanks! Yes, we have studied the documentation; yet, the annotation instructions can never be 100% complete, if I may say so :)

We can and should of course have some freedom in details and interpretations in our adaptation. However, we value your expert opinion. May we ask you to share your vision? A few more cases, if we may.

E.g. in this example, a very precise location is given: district, road name, distance from the city. Also, we have a name of the truck manufacturer. I quote the whole sample, because we have quite a few cases of the like in our news data.

Original text in Kyrgyz: бүгүн Токтогул районунда Бишкек -ош жолунун 210-чакырымында "Мерседес" үлгүсүндөгү жүк ташуучу унаа көпүрөдөн өтүп бара жатып сууга түшүп кеткен

Translation into English: in Toktogul district, on 210 km of Bishkek-Osh highway, a Mercedes truck fell into the water while crossing a bridge

  1. Would you suggest selecting a whole location description as a single location span? (Toktogul district, on 210 km of Bishkek-Osh highway, Токтогул районунда Бишкек -ош жолунун 210-чакырымында)
  2. Would you select a 'Mercedes truck' ("Мерседес" үлгүсүндөгү жүк ташуучу унаа) as an ARTIFACT? Or just the name of the company 'Mercedes' as BUSINESS?

Having already started the annotation, we get a lot of questions about titles as well.

  1. Suppose we have a PERSON entity Жогорку Кеңештин депутаты Аскар Аскаров (Higher Council (Parliament) member (delegate) Askar Askarov). In other sentences in the same text this person may be referenced simply as 'депутат'. Should be tag those 'депутат' mentions as a TITLE, in your opinion?

Thank you!

alexeyev avatar Jun 07 '23 06:06 alexeyev

Dear @alexeyev indeed the annotation are not always clear and require some interpretation. We have tried to document well every decision we've made so that it was possible to understand the reasons.

In respect to your questions, I think as follow:

  1. In our latest guidelines we decided to try to collect the largest entity match (see the detailed information in case you haven't seen it https://grobid-ner.readthedocs.io/en/latest/largest-entity-mention/) so in this case I would collect the whole location description.
  2. In this case, if you choose the "Mercedes truck" then it should be ARTIFACT, else if you select only "MERCEDES" would be, as you said "BUSINESS", in this case it's up to you depending on your strategy (see previous).
  3. This is a bit tricky, however I think I would agree with you to use TITLE. This is also something we had struggled with (https://grobid-ner.readthedocs.io/en/latest/class-and-senses/#title).

Let me know if you need further clarification

lfoppiano avatar Jun 13 '23 01:06 lfoppiano