checklistbank icon indicating copy to clipboard operation
checklistbank copied to clipboard

Use ISO 639-3 (3-letter) language codes in the database

Open MattBlissett opened this issue 7 years ago • 2 comments

We have checklists containing names with less-spoken languages, which only have ISO 639-3 3-letter language codes.

Our API exposes 3-letter codes, but languages are parsed to two-letter codes and stored in the database as two-letter codes.

We should change to use three-letter codes throughout (though still accepting 2-letter codes, of course).

This checklist has many vernacular names with less-spoken three-letter languages: https://www.gbif.org/species/search?dataset_key=a0b06e2e-287a-4687-8a6c-2c0cfb31c16d&origin=SOURCE&issue=VERNACULAR_NAME_INVALID&advanced=1

MattBlissett avatar Oct 05 '18 13:10 MattBlissett

The problem boils down to our Language enumeration which only tracks 2 letter codes: https://github.com/gbif/gbif-api/blob/master/src/main/java/org/gbif/api/vocabulary/Language.java#L36

The API does not use strings but this enumeration

mdoering avatar Oct 08 '18 10:10 mdoering

changing the db would not be a big thing, but pimping the enum and the LanguageParser a bit more. Wikipedia claims there are currently 7776 3 letter codes. Do we want to manage them in an enumeration still?

mdoering avatar Oct 08 '18 10:10 mdoering