hugo
hugo copied to clipboard
Support special string operations for Turkish and Azeri languages
Turkish tr and Azeri az languages distinguish dotted and dotless i and I in their typesetting. For the correct typesetting in these languages, we need to convert lowercase dotted i to uppercase dotted İ (for upper) and uppercase dotless I to lowercase dotless ı (for lower).
Golang already supports the special string mappings for these languages using the functions:
- https://pkg.go.dev/strings#ToUpperSpecial
- https://pkg.go.dev/strings#ToLowerSpecial
- https://pkg.go.dev/strings#ToTitleSpecial
Adding this feature to Hugo will improve Turkish and Azeri language support.
According to the Godoc of https://pkg.go.dev/strings#ToLowerSpecial
ToTitleSpecial returns a copy of the string s with all Unicode letters mapped to their Unicode title case, giving priority to the special casing rules.
Where do we get the "special casing rules" for a given language?
I think they are already implemented in the unicode package as unicode.TurkishCase and unicode.AzeriCase. Those are the only ones having special casing rules and implemented as such.
Besides the functions in the strings package, the documentation here also gives the following example.
package main
import (
"fmt"
"unicode"
)
func main() {
t := unicode.TurkishCase
const lci = 'i'
fmt.Printf("%#U\n", t.ToLower(lci))
fmt.Printf("%#U\n", t.ToTitle(lci))
fmt.Printf("%#U\n", t.ToUpper(lci))
const uci = 'İ'
fmt.Printf("%#U\n", t.ToLower(uci))
fmt.Printf("%#U\n", t.ToTitle(uci))
fmt.Printf("%#U\n", t.ToUpper(uci))
}
And the output is:
U+0069 'i'
U+0130 'İ'
U+0130 'İ'
U+0069 'i'
U+0130 'İ'
U+0130 'İ'