latex2e icon indicating copy to clipboard operation
latex2e copied to clipboard

\textcommabelow as Unicode accent

Open larsgw opened this issue 5 years ago • 12 comments

I was looking through tuenc.def and apparently \textcommabelow is not defined as the COMBINING COMMA BELOW (U+0326), but rather something more complex:

https://github.com/latex3/latex2e/blob/59b3e833a5507d01c78171c9e42570485aea2b4d/base/ltoutenc.dtx#L1563-L1566

Does this have to do with the fact it is defined with \DeclareTextCommand? If not, what is the reason it is not defined like the other accents?

larsgw avatar Mar 21 '20 17:03 larsgw

I guess it's because the combining comma below works badly with Latin Modern:

\documentclass{article}

\usepackage{fontspec}

\begin{document}

\def\test{%
  \textcommabelow S
  \textcommabelow V
  V^^^^0326\par
}

\test

\fontspec{DejaVu Serif}\test

\fontspec{Old Standard}\test

\end{document}
image

Probably not too good a reason. The first character is copied as Ș in all three cases.

eg9 avatar Mar 21 '20 20:03 eg9

Given that LM is the default font family with LaTeX on LuaTeX/XeTeX I think it is reasonable to make it behave visually correct for that family (until LM is corrected). Should be reported to the LM developers though and eventually normalized if the fonts are corrected.

FrankMittelbach avatar Mar 22 '20 10:03 FrankMittelbach

Note that tuenc.def contains only the following four composite commands:

\DeclareUnicodeComposite{\textcommabelow}{S}{"0218} \DeclareUnicodeComposite{\textcommabelow}{s}{"0219} \DeclareUnicodeComposite{\textcommabelow}{T}{"021A} \DeclareUnicodeComposite{\textcommabelow}{t}{"021B}

It seems that these four are the only characters in Unicode that include this diacritic. Assuming that these four glyphs are ‘visually correct’ in LM, what further correction does LM need?

There are a few other uses of this diacritic that maybe will one day get into Unicode, but V is nowhere in that list.

car222222 avatar Mar 22 '20 10:03 car222222

I'm not expert on any of the languages that uses the comma accent, but Latvian, for example, has k, l, r with comma accent (not in unicode that seems to have only the Romanian s and t) and at least the "l" comes out wrong without the special definition and there may be others.

FrankMittelbach avatar Mar 22 '20 11:03 FrankMittelbach

Yes, S, T are only for Romanian.

Unicode uses the cedilla diacritic, rather than comma, for Latvian (and probably for all other uses). Note that it us used with G, g also. Use with g (U-0123] is quite interesting.

Livonian has: ḑ, ļ, ņ, ŗ, ț

Does LM always get the cedilla correct?

car222222 avatar Mar 22 '20 11:03 car222222

The fact that we have

\DeclareUnicodeComposite{\textcommabelow}{s}{"0219}

means that the "expected" composites will go to the pre-composed characters so if we make a change it would only affect the use of the accent on unexpected letters.

we did at one point have a version of the unicode accent setup with a three way switch

pre-composed if the font has it else combining character if the font has the combining character else tex accent construct

but the current version seems to just have two branches and use the composing character or the tex construct as specified in each case. I have a feeling there were complications with interfering with the heuristics harfbuzz/luaotfload do in this area, but I can't find the discussion now @wspr do you recall? not sure if it was here or in fontspec.

davidcarlisle avatar Mar 22 '20 12:03 davidcarlisle

@car222222 Unicode does not distinguish between cedilla and comma below, leaving the rendering to the font designer. Latvian tradition uses comma below (and above in the case of lowercase g) and indeed fonts usually comply.

An exception was later made for Romanian, because “S with cedilla” is used in Turkish with a real cedilla. According to Wikipedia, Ţ is used in the Gagauz alphabet (with cedilla, not comma, because modeled after the Turkish alphabet).

My opinion is that we should add combination \textcommabelow+(Latvian/Livonian letters) to point to the letter with cedilla, following Unicode, and leave alone Latin Modern with its bad positioning for unused letters, hoping they'll fix it.

eg9 avatar Mar 22 '20 15:03 eg9

“Unicode does not distinguish between cedilla and comma below” . . . except, as noted, when it does (or will) so distinguish.

‘Unicode’ leaves all rendering to the font designer, does it not?

car222222 avatar Mar 22 '20 16:03 car222222

@car222222 What I wanted to underline is that, notwithstanding Latvian has always used a comma below, Unicode decided to arbitrarily name it “cedilla”.

eg9 avatar Mar 22 '20 16:03 eg9

An exception was later made for Romanian, because “S with cedilla” is used in Turkish with a real cedilla. According to Wikipedia, Ţ is used in the Gagauz alphabet (with cedilla, not comma, because modeled after the Turkish alphabet).

but what I read on the web is that unicode got it wrong there and the real undercomma is wanted/needed to get those language(s) right and not substituting a cedilla.

My opinion is that we should add combination \textcommabelow+(Latvian/Livonian letters) to point to the letter with cedilla, following Unicode, and leave alone Latin Modern with its bad positioning for unused letters, hoping they'll fix it.

given the fact that the current setup gets it visually right for LM and other fonts I would not touch it prior to LM being corrected and I'm not happy about the idea of hardwiring the substitutions either. So my course of action proposal is to report it to the LM designers and drop the single accent definition when the fixed it.

FrankMittelbach avatar Mar 22 '20 16:03 FrankMittelbach

I am still unsure about this: what exactly would be ‘visually right for LM and other fonts’? In other words, precisely what ‘fixes’ do the LM (and/or other font) designers need to make?

For the cedilla there is certainly no agreement about its shape or its positioning (even in the case of French orthography). When it is under one of the following (for example): k m n , then its ‘correct position’ can either be ‘centred’ or ‘under the rightmost stem’.

For some of this discussion, see:
https://unicode.org/L2/L2013/13155r-cedilla-comma.pdf This contains a lot of interesting but inconclusive information and comment from a wide variety of sources, mostly about Marshallese and Latvian; no Livonian, and the Romanian decision gets quite a hammering!

car222222 avatar Mar 23 '20 07:03 car222222

My personal \ae sthetic: the resizing of the diacritic in the built-up case (V) looks visually better to me. I assume this is the difference between these two examples from Enrico: \textcommabelow S \textcommabelow V Thus. whilst personally I would not claim that the design of the S-with-comma-below in LM ‘needs fixing’, I am happy to describe it as ‘unbalanced’ or maybe even ‘ugly’.
Maybe there is a recent Romanian font design to show us what it should look like?

car222222 avatar Mar 23 '20 07:03 car222222