luautf8 icon indicating copy to clipboard operation
luautf8 copied to clipboard

Function title does the same thing as function upper

Open bpj opened this issue 1 year ago • 3 comments

It seems like function title does the same thing as function upper:

local lua_utf8 = require("lua-utf8")
local str = 'någonting, нічого, τίποτα'
print(lua_utf8.title(str))
print(lua_utf8.upper(str))
NÅGONTING, НІЧОГО, ΤΊΠΟΤΑ
NÅGONTING, НІЧОГО, ΤΊΠΟΤΑ

Just for clarity:

str = 'någonting, нічого, τίποτα'

title = NÅGONTING, НІЧОГО, ΤΊΠΟΤΑ

upper = NÅGONTING, НІЧОГО, ΤΊΠΟΤΑ

expected title = Någonting, Нічого, Τίποτα

luautf8 0.1.6-1

bpj avatar Jan 12 '25 15:01 bpj

FYI this function wrapping upper, lower and gsub does what I would expect titlecase to do

local lua_utf8 = require("lua-utf8")
local u_upper = lua_utf8.upper
local u_lower = lua_utf8.lower
local u_gsub = lua_utf8.gsub
local u_title
do
  local title = function(u, l)
    return u_upper(u) .. u_lower(l)
  end
  local pat = '%f[%w](%a)(%a*)%f[^%w]'
  u_title = function(s)
    local t = u_gsub(s, pat, title)
    return t
  end
end
print(u_title('någonting, нічого, τίποτα'))

-- Någonting, Нічого, Τίποτα

bpj avatar Jan 12 '25 16:01 bpj

maybe what you want is to capitalize the first letter, not "convert all letters into the title case" (which is different than the upper case in Unicode standards), which is what the function "title" does.

starwing avatar Jan 12 '25 21:01 starwing

Title casing a word or string is much much more complicated than just mapping some Unicode casing onto characters. (At least it is for prose, the issue of ASCI programming tokens is somewhat easier.) I suggest this library sticks to the Unicode casing definitions and ignore the kettle-of-fish that is actual title casing. A naive implementation of title casing the first letter (hint, naive wrapper using gsub() in the comment above won't work for all languages either) is bound to fall flat in enough cases I suggest it be left as an excessive to the library consumer, or better yet to an actual prose casing library. I can suggest my own decasify library if you want a LuaRock that handles English and Turkish title casing, and contributions for other languages are welcome.

alerque avatar Jan 13 '25 11:01 alerque