unidecoder
unidecoder copied to clipboard
FYI: Text Transliteration in stringi
Hi @rich-iannone, Just for your information: I've added an interface for ICU's Transliterator in stringi yesterday, see this issue....
Some examples:
> stri_trans_general("zażółć gęślą jaźń", "Latin-ASCII") # Polish text
[1] "zazolc gesla jazn"
> stri_trans_general("„groß”©", "Latin-ASCII")
[1] ",,gross\"(C)"
> stri_trans_general("stringi", "Latin-Greek")
[1] "στριγγι"
> stri_trans_general("stringi", "Latin-Cyrillic")
[1] "стринги"
There's also some capability in base R:
td <- readLines(curl::curl("https://raw.githubusercontent.com/rich-iannone/UnidecodeR/master/inst/examples/Totentanz__de.txt"))
iconv(td, to = "ASCII//translit")
But as far as I can tell, the transliteration feature of iconv leads to different outputs on different platforms, which is an issue in many cases.