hugo icon indicating copy to clipboard operation
hugo copied to clipboard

Support special string operations for Turkish and Azeri languages

Open doganulus opened this issue 3 years ago • 2 comments

Turkish tr and Azeri az languages distinguish dotted and dotless i and I in their typesetting. For the correct typesetting in these languages, we need to convert lowercase dotted i to uppercase dotted İ (for upper) and uppercase dotless I to lowercase dotless ı (for lower).

Golang already supports the special string mappings for these languages using the functions:

  • https://pkg.go.dev/strings#ToUpperSpecial
  • https://pkg.go.dev/strings#ToLowerSpecial
  • https://pkg.go.dev/strings#ToTitleSpecial

Adding this feature to Hugo will improve Turkish and Azeri language support.

doganulus avatar Aug 03 '22 18:08 doganulus

According to the Godoc of https://pkg.go.dev/strings#ToLowerSpecial

ToTitleSpecial returns a copy of the string s with all Unicode letters mapped to their Unicode title case, giving priority to the special casing rules.

Where do we get the "special casing rules" for a given language?

bep avatar Aug 03 '22 21:08 bep

I think they are already implemented in the unicode package as unicode.TurkishCase and unicode.AzeriCase. Those are the only ones having special casing rules and implemented as such.

Besides the functions in the strings package, the documentation here also gives the following example.

package main

import (
	"fmt"
	"unicode"
)

func main() {
	t := unicode.TurkishCase

	const lci = 'i'
	fmt.Printf("%#U\n", t.ToLower(lci))
	fmt.Printf("%#U\n", t.ToTitle(lci))
	fmt.Printf("%#U\n", t.ToUpper(lci))

	const uci = 'İ'
	fmt.Printf("%#U\n", t.ToLower(uci))
	fmt.Printf("%#U\n", t.ToTitle(uci))
	fmt.Printf("%#U\n", t.ToUpper(uci))
}

And the output is:

U+0069 'i'
U+0130 'İ'
U+0130 'İ'
U+0069 'i'
U+0130 'İ'
U+0130 'İ'

doganulus avatar Aug 03 '22 22:08 doganulus