marc2bibframe2 icon indicating copy to clipboard operation
marc2bibframe2 copied to clipboard

Language tags and 880 fields

Open kiegel opened this issue 8 years ago • 3 comments

In regard to internationalization, the logic for applying language tags needs work for parallel-script fields (880), e.g. with translations or parallel titles.

Incorrect Language Tags and Script Subtags For example, problems crop up with OCLC #271414, an English translation of a Russian work.

<http://lib.washington.edu/ld/test/99114652250001452#Work880-45> a bf:Work ;
    rdfs:label "Евгений Онегин."@en-cyrl ;

The label is Cyrillic but in Russian, not English.

Work [ a bflc:Relationship ;
            bflc:relation [ a bflc:Relation ;
                    rdfs:label "Container of (expression)"@en-cyrl ] ;
            bf:relatedTo <http://lib.washington.edu/ld/test/99114652250001452#Work880-44> ].

The label is English but not Cyrillic. In general, it is vanishingly rare for a string to be both in the English language and in the Cyrillic script.

OCLC # 793950140, a Chinese translation of a Japanese work.

<http://lib.washington.edu/ld/test/99131426860001452#Work> a bf:Text,
        bf:Work ;
    rdfs:label "Inō Kanori no Taiwan tōsa nikki. Chinese",
        "伊能嘉矩の臺湾踏柤日記. Chinese"@zh-hani .

The title in the label is Japanese, not Chinese.

OCLC # 893875561, a Latvian book with a parallel title in Russian.

[ a bf:ParallelTitle,
                bf:Title,
                bf:VariantTitle ;
            rdfs:label "Заяц и его друзья : латышские народные сказки о животных"@lv-cyrl ;
            bf:mainTitle "Заяц и его друзья"@lv-cyrl ;
            bf:subtitle "латышские народные сказки о животных"@lv-cyrl ]

The title in the label, mainTitle and subtitle is Russian, not Latvian.

Compliance with IETF RFC 5646 Use of language tags should follow the practices given in IETF RFC 5646 [1]. Concerning the script subtag, on page 12 it states “[it] SHOULD be omitted when it adds no distinguishing value to the tag or when the primary or extended language subtag's record in the subtag registry includes a 'Suppress-Script' field listing the applicable script subtag”.

For example, for OCLC # 1779370:

<http://lib.washington.edu/ld/test/99129152590001452#Agent880-32> a bf:Agent,
        bf:Jurisdiction ;
    rdfs:label "Russia. Министерство народнаго просвѣщенія."@ru-cyrl .

Russian has the Suppress-Script field so a script subtag for Cyrillic is prohibited.

Not Good Practice Using a language tag for numeric data in bf:part is not wrong but probably not a good practice.

<http://lib.washington.edu/ld/test/99129152590001452#Instance880-38> a bf:Instance ;
    bf:part "1825-29"@ru-cyrl ;
    bf:title [ a bf:Title ;
            rdfs:label "Записки"@ru-cyrl ] .

[1] https://tools.ietf.org/html/bcp47

kiegel avatar Dec 19 '17 16:12 kiegel

This is complicated since I think some of this is bad data vs bad conversion. We'll investigate and report back.

kirkhess avatar Dec 22 '17 13:12 kirkhess

I've also seen the converter create @ru-cyrl language tags where the -cyrl is redundant and forbidden by BCP 47. I've chosen to ignore them for now.

osma avatar Dec 22 '17 13:12 osma

The specs are going to be updated - pretty sure the best solution is to stop adding tags based on 008+$6.

If the marc included the language with the script it would be different and is technically possible, we were also going to look into that as well.

kirkhess avatar Jan 05 '18 14:01 kirkhess