schema icon indicating copy to clipboard operation
schema copied to clipboard

Recommendation for link to ALTO in iiif manifest

Open cneud opened this issue 9 years ago • 11 comments

The iiif defines a Presentation API that allows the representation of - where available - OCR results in ALTO as annotations, linked by a manifest.

Example:

seeAlso: {
@id: "http://wellcomelibrary.org/service/alto/b19956435/0?image=0",
format: "application/alto+xml", 
profile: "http://www.loc.gov/standards/alto/",
label: "ALTO"\
}

It would be good to have a recommendation from the ALTO board on the values for two fields, format and label. The format should resemble a MIME-type, e.g. application/xml or text/xml, while the later can be a simple text like "ALTO XML", "ALTO OCR" or similar.

cneud avatar Jul 21 '16 16:07 cneud

Also, should the profile refer to the XSD, namespace or other?

cneud avatar Jul 27 '16 15:07 cneud

Since version information can be important for data consumers, a reference that indicates the version would make sense for the profile. If there are no breaking changes between minor versions with regards to how OCR text is expressed in ALTO, the namespace would suffice.

kba avatar Jul 27 '16 16:07 kba

First of all I appreciate the initiative of iif and glad alto is considered on this api as one standard format. Due to the case that ALTO is not containing appliciation specific information than containing text content, the format should be "text/xml". This is according to what was has been used on MIMETYPE attribute in METS on existing METS profiles and as done on the Europeana newspaper project. I agree regarding the "profile" to statement of "kba". Regarding the "label" I suppose this is only used for display purpose and spacing is no issue on this.

So I would recommend as followed for an ALTO file of version 3:

seeAlso: {
@id: "http://wellcomelibrary.org/service/alto/b19956435/0?image=0",
format: "text/xml", 
profile: "http://www.loc.gov/standards/alto/v3",
label: "ALTO XML"\
}

Jo-CCS avatar Jul 28 '16 07:07 Jo-CCS

I wonder whether it might be worth considering the registration of a MIME type "application/alto+xml", similar to what RFC6207 specifies for METS/MODS/MADS/MARC21/SRU.

cneud avatar Jul 28 '16 11:07 cneud

@Jo-CCS Yes, "label" is a free text field and only used for orientation.

cneud avatar Jul 28 '16 13:07 cneud

Yes, also a registration of MIME type "application/alto+xml" makes sense to me.

Jo-CCS avatar Aug 08 '16 05:08 Jo-CCS

"application/alto+xml" sounds great to me. IIIF documentation has already some samples with "application/tei+xml"

altomator avatar Oct 27 '16 14:10 altomator

Note that "application/tei+xml" also has RFC6129 supporting it. We should therefore check whether "application/alto+xml" can be included in an update to RFC6207 and how, or whether a new RFC must be prepared (by whom?)

cneud avatar Oct 27 '16 14:10 cneud

To register alto+xml, we need to write a RFC and submit it to iana.org. -> tei+xml : https://tools.ietf.org/html/rfc6129

My BnF colleagues argue that it's not mandatory. Eg: application/warc isn't declared at IANA but it's an ISO standard. -> http://www.iso.org/iso/catalogue_detail.htm?csnumber=44717

altomator avatar Jan 10 '17 13:01 altomator

Certainly one can also live without the RFC, but note that due to this, WARC is also not currently considered a registered MIME-type, cf. https://kris-sigur.blogspot.de/2016/05/warc-mime-type.html "if we wish to have this standardized then going through this process is the only option"

cneud avatar Jan 10 '17 13:01 cneud

RFC draft: https://docs.google.com/document/d/1Bu9BWDlgdj_ALk1Z7uNY5bX93y5LqEbwvm0RC0w0kvc/edit?usp=sharing

altomator avatar May 04 '17 14:05 altomator