latex3
latex3 copied to clipboard
\text_titlecase:n and string variables
The \text_titlecase:n
command does not seem to work correctly with string variables. The following document gives me the output “ABC”, but it should be “Abc”.
\documentclass{article}
\begin{document}
\ExplSyntaxOn
\str_new:N \l_test_str
\str_set:Nn \l_test_str {abc}
\text_titlecase:n {\l_test_str}
\ExplSyntaxOff
\end{document}
Version numbers:
This is XeTeX, Version 3.141592653-2.6-0.999993 (TeX Live 2021) (preloaded format=xelatex)
LaTeX2e <2021-06-01> patch level 1
L3 programming layer <2021-08-27>
The core issue here is that to detect the first 'letter' we decided to use the catcode of each token: that seems reasonable in most cases. We could I guess cover strings by checking the catcode of the char in general too, but this seems to be getting a bit involved.
I think this is a confusing choice and I agree that "Abc"
would be more expected, namely using \int_compare:nTF { 11 = \char_value_catcode:n {`#1} }
instead of \token_if_eq_catcode:NNTF X #1
I think I went this way as we are expecting 'text', which implies a somewhat predictable catcode regime. However, we can certainly adjust here: I guess the question is whether to have just \char_value_catcode:n
or to use a \bool_lazy_or:nn
test and cover both?
@josephwright why not only some fast check for the catcode? Something like
\prg_new_conditional:Npnn \my_if_letter:N #1 { TF }
{
\exp_after:wN \__my_if_letter:w \tex_the:D \tex_catcode:D `#1 \exp_stop_f:
\__my_if_letter_true:w 11
\if_false:
\prg_return_true:
\else:
\prg_return_false:
\fi:
}
\cs_new:Npn \__my_if_letter:w #1 11 {}
\cs_new:Npn \__my_if_letter_true:w 11 \if_false: { \if_true: }
@Skillmon Speed is not the issue here: all of the token-by-token mappings are relatively slow. It's a question of what we want to support.
I suspect I'll just have to make a call here ...
This I suspect links to https://github.com/latex3/latex3/pull/1141 - if we take the latter approach, the same data structure could be used to save the Unicode class information into the format. I can then adjust the case changer to look at the Unicode class of character tokens rather than the catcode - that would be as expected by the Unicode algorithm and would be more appropriate in particular in pdfTeX. Thoughts?