archive-hocr-tools
archive-hocr-tools copied to clipboard
Feature: Add ISO 639 part2b support for normalize_language
This commit adds support for converting to two characters ISO 639 Part2b languages, e.g. fre
for French rather than the Part3 fra
.
IA items will often include fre
, ger
, etc., in the metadata language field (see, e.g.
https://archive.org/metadata/101610331.nlm.nih.gov/metadata/language).
But this was being passed through as the literal string fre
rather than being converted to fr
. DAISY and Epub readers don't recognize fra
as a valid languge, and instead display the literal string.