inception icon indicating copy to clipboard operation
inception copied to clipboard

Using INCEpTION for text classification

Open xegulon opened this issue 4 years ago • 12 comments

Is your feature request related to a problem? Please describe. INCEpTION is excellent when it comes to annotating relations and entities, but I don't find it suitable for text classification tasks, i.e. when I want to build datasets where the label applies to a whole sequence of text.

Describe the solution you'd like When I create a new project, currently the choices are Basic annotation (Span/relation), Entity Linking (Wikidata) and Standard project. I would like to have an additional choice: Text classification. The annotations would be exportable as a two-column CSV (first column, the string of the text, and second column, the label), or as a JSON file (or JSONL).

Describe alternatives you've considered

  • I could use Doccano for that, but I would lose the inter-annotator features INCEpTION has, and I would be forced to use several tools instead of one (INCEpTION for spans/relations and Doccano for sequence classification).
  • I could use the Basic annotation (Span/relation) project, and label only the first token with the class the whole sequence belongs to, and find a way to manage the conversion afterwards, but this is kinda tricky

Additional context The UI for spans/relations is quite advanced, so I think it wouldn't be hard to make a UI for text classification! Thanks again for the wonderful tool!

xegulon avatar Jul 16 '21 08:07 xegulon

See https://inception-project.github.io/releases/0.19.7/docs/user-guide.html#_document_metadata

However, inter-annotator calculation is presently not available for document-level annotations.

When using document-level annotations, the export format must be XMI CAS - the other formats do not support it.

reckart avatar Jul 16 '21 09:07 reckart

Thanks for the answer. Is it on the roadmap to implement inter-annotator agreement features for text classification?

xegulon avatar Jul 26 '21 08:07 xegulon

I found this @reckart : https://colab.research.google.com/github/inception-project/inception-project.github.io/blob/master/_example-projects/python/INCEpTION_Annotations_as_one_sentence_and_label_per_line.ipynb

Is it possible to use the inter-annotator features with that?

xegulon avatar Jul 27 '21 09:07 xegulon

@xegulon Not sure what you mean?

If you mean if you could use that code as a basis to export your data from INCEpTION and then do your agreement calculation externally - you could probably do that.

Regarding agreement for document-level features in the application: we now have an issue for it on the roadmap (this issue here), but the roadmap is rather dynamic - so no particular time for this feature to arrive atm.

reckart avatar Jul 27 '21 13:07 reckart

Great thanks, I'll cope with the first solution for now I think. Eager to see the coming dedicated UI!

xegulon avatar Jul 28 '21 12:07 xegulon

Precision: it would be great to implement at the same time the UI for text classification to a single class, but also to several classes (multilabel text classification).

Also, for single text examples that spread through multiple lines, it would be important to be able to import datasets as JSON(L), and not only enable dataset import as one sentence per line.

xegulon avatar Jul 29 '21 09:07 xegulon

You mean you'd like a format that imports "one document per line"?

reckart avatar Jul 29 '21 09:07 reckart

In some sort yes. But this would be possible only with JSON files. The goal is to be able to take into account files with newline characters.

xegulon avatar Jul 29 '21 15:07 xegulon

I also have a similar use-case for document classification with multi-label, e.g. keyphrase extraction/generation. I would also like to use external recommeder to perform some active learning thanks to some unsupervised model at the beginning, which would help build our dataset. (or import some existing one). But I didn't find a way to be able to configure such external recommender. I think it would also be very helpful to have this kind of feature. In any case, thanks for this wonderful tool!

GeoloeG-IsT avatar Nov 12 '21 20:11 GeoloeG-IsT

I am wondering about this functionality as well. My team is in need of a document level annotation and we need agreement scores for that. Is there a plan to implement agreement scores at document level of annotation?

jeweinb avatar Jun 09 '23 13:06 jeweinb

I don't see an issue for document-level agreement in our tracker yet. Feel free to add one. Note though that having an issue is just to keep it on the radar - it does not make it a priority.

reckart avatar Jun 09 '23 16:06 reckart