inception icon indicating copy to clipboard operation
inception copied to clipboard

HTML would be a good way to display the annotation outside INCEpTION

Open reckart opened this issue 4 years ago • 3 comments

For importing HTML files, we use the DKPro Core HtmlDocumentReader. The counterpart for writing files for this is XmlDocumentWriter.

The HtmlDocumentReader takes the XML/HTML structure of the HTML file being read and represents it as annotations (annotations not visible in INCEpTION). The XmlDocumentWriter takes the annotation-encoded XML/HTML structure and writes it back into an XML (HTML) file. In this process, only the text and the XML/HTML annotations are considered. Other annotations are completely ignored because there is no guarantee that these could be aligned with the strictly hierarchical XML/HTML structure.

If you wanted to export a visualization, there would in theory be the BratWriter but I believe you would find it very surprising and strange because the output it produces is nothing similar to the brat visualization you see in INCEpTION. This is because INCEpTION has access to additional on formation on layers and features which the DKPro Core BratWriter does not have because the additional information is part of INCEpTION and not of UIMA.

A possible idea might be to take the XmlDocumentWriter and enhance it into a HtmlDocumentWriter which would store annotations e.g. as W3C microdata while also embeddding some JavaScript to interpret that microdata and render it as highlights over the text.

Originally posted by @reckart in https://github.com/inception-project/inception/issues/2070#issuecomment-797295940

reckart avatar Jun 05 '21 20:06 reckart

Hello,

Thanks again for this amazing project.

I think I may have brought up the matter of HTML as the export format so that the content can be displayed on websites without being images.

The "image as display" was an issue with the old Quran Corpus format, but they seem to be using HTML now:

https://qurancorpus.app/treebank/8:21

https://github.com/kaisdukes/quranic-corpus

Will Inception also be supporting something similar to what the modern Quran corpus app has?

Kentoseth avatar Oct 03 '23 22:10 Kentoseth

The Quran Corpus reader appears to be using a JavaScript-based SVG renderer to turn JSON into an SVG image.

The input format to this renderer can be seen here: https://qurancorpus.app/api/syntax?location=8:21&graph=1

Let's assume that the open source quranic-corpus repo allows you to deploy their web interface with custom data. In that case, you could export the annotated data from INCEpTION as CAS XMI XML (1.0) or CAS JSON (0.4.0), read it using DKPro Cassis and then use custom mapping logic to write it out again in the Quran Corpus JSON format. After feeding that data to the corpus web interface, you should then be able to see it.

reckart avatar Oct 04 '23 05:10 reckart

Is it possible for Inception to have a similar feature of "using a JavaScript-based SVG renderer to turn JSON into an SVG image" ?

This way the solution can be generalized to support all linguistic annotations. I think Quran Corpus is also Java-based, so maybe much of the code can be re-used to achieve the same "export as SVG image"?

Kentoseth avatar Oct 04 '23 11:10 Kentoseth