latex3 \text_titlecase:n and string variables

The \text_titlecase:n command does not seem to work correctly with string variables. The following document gives me the output “ABC”, but it should be “Abc”.

\documentclass{article}
\begin{document}
\ExplSyntaxOn
\str_new:N \l_test_str
\str_set:Nn \l_test_str {abc}
\text_titlecase:n {\l_test_str}
\ExplSyntaxOff
\end{document}

Version numbers:

This is XeTeX, Version 3.141592653-2.6-0.999993 (TeX Live 2021) (preloaded format=xelatex)
LaTeX2e <2021-06-01> patch level 1
L3 programming layer <2021-08-27>

Aug 30 '21 18:08 wehro

The core issue here is that to detect the first 'letter' we decided to use the catcode of each token: that seems reasonable in most cases. We could I guess cover strings by checking the catcode of the char in general too, but this seems to be getting a bit involved.

Aug 30 '21 21:08 josephwright

I think this is a confusing choice and I agree that "Abc" would be more expected, namely using \int_compare:nTF { 11 = \char_value_catcode:n {`#1} } instead of \token_if_eq_catcode:NNTF X #1

Aug 30 '21 23:08 blefloch

I think I went this way as we are expecting 'text', which implies a somewhat predictable catcode regime. However, we can certainly adjust here: I guess the question is whether to have just \char_value_catcode:n or to use a \bool_lazy_or:nn test and cover both?

Aug 31 '21 08:08 josephwright

@josephwright why not only some fast check for the catcode? Something like

\prg_new_conditional:Npnn \my_if_letter:N #1 { TF }
  {
    \exp_after:wN \__my_if_letter:w \tex_the:D \tex_catcode:D `#1 \exp_stop_f:
      \__my_if_letter_true:w 11
    \if_false:
      \prg_return_true:
    \else:
      \prg_return_false:
    \fi:
  }
\cs_new:Npn \__my_if_letter:w #1 11 {}
\cs_new:Npn \__my_if_letter_true:w 11 \if_false: { \if_true: }

Aug 31 '21 10:08 Skillmon

@Skillmon Speed is not the issue here: all of the token-by-token mappings are relatively slow. It's a question of what we want to support.

Aug 31 '21 10:08 josephwright

I suspect I'll just have to make a call here ...

Sep 09 '21 17:09 josephwright

This I suspect links to https://github.com/latex3/latex3/pull/1141 - if we take the latter approach, the same data structure could be used to save the Unicode class information into the format. I can then adjust the case changer to look at the Unicode class of character tokens rather than the catcode - that would be as expected by the Unicode algorithm and would be more appropriate in particular in pdfTeX. Thoughts?

Nov 07 '22 13:11 josephwright

latex3 latex3 copied to clipboard

\text_titlecase:n and string variables

latex3
latex3 copied to clipboard