Junicode-font Duplicate Alternates in Character Variants

Duplicate Alternates in Character Variants

Open kenmcd opened this issue 3 years ago • 17 comments

Some of the character variants have multiple duplicates in the alternates available. This appears to have started between v1.006 and 1.007, and continues to get worse. This causes applications like Affinity Publisher to display all those duplicates as available alternates. And causes the alternates to not work properly in LibreOffice because the alternate numbers no longer match the documentation, and the alternate numbers have changed with new versions so a document created with v1.006 may be broken in v1.010.

Below is an image of the cv01 code from v1.010. I have highlighted the duplicates in yellow.

Duplicates-in-Character-Variants-A-v1 010

There are a lot of these. Take a look at: Aa cv01, Ee cv07, Ff cv09, Gg cv10, etc., etc.

May 25 '21 21:05 kenmcd

Several versions ago I ran into a problem with the OpenType specification for cv01-cv99, which reads in part, "Within each 'cvXX' feature, the number of variants should be identical for all glyphs." So a feature record that looked like this

sub a     from [ a.alt3  a.alt1  a.alt2 a.alt4 a.alt5 ] ;
sub A     from [ A.alt1 A.alt2 A.alt3 ] ;
sub a.sc from [ a.alt1.sc   ] ;

(which is the kind of thing I had in the code in earlier versions) was incorrect, since they should all have the same number of substitutions to choose from. And yet the MUFI specification doesn't provide the same number of variants for each flavor of A!

It is possible, however, to pad the lookups, and that's what you're seeing here. An advantage of this way of doing things is that equivalent variants of the three As can be aligned so that they take the same index. A is a rather involved case to use as an example of this, but look at the variants for M (cv18): If you apply cv18[1] or cv18[2] to an M, it doesn't matter if the letter is lowercase, uppercase or small cap: you will still get an equivalent variant. Only one variant is defined for "M WITH RIGHT DESCENDER," so if you select that, capital and small cap M will not change. So you can, say, first apply cv18 to a lowercase m, and then capitalize it or small cap it, and the variant you get won't be a surprise.

If there were no duplications in the lookup, "LATIN EPIGRAPHIC LETTER ARCHAIC M" (now in column 4, cv18[4]) would end up in column 3 (cv18[3], implicitly equivalent to "M WITH RIGHT DESCENDER")—which wouldn't make a lot of sense.

So those duplications (or I would say "that padding") in the cvXX lookups have some advantages: compliance with the spec and consistency in a complex system of lookups. But of course, there are always tradeoffs, and you have discovered one: if a program allows you to choose from a list of cvXX variants for a character (rather than selecting via an index as in XeLaTeX and LibreOffice), you'll see all that padding.

If there's another solution for the problems I've outlined here, I'll of course be happy to hear and discuss and think about it.

May 26 '21 01:05 psb1558

Here btw are the variants for A. You can see from the table why I have (tentatively) included variants for the a-digraphs in this lookup.

May 26 '21 01:05 psb1558

I just downloaded a trial copy of Affinity Publisher (Mac) to see for myself what's going on. The cvXX interface in this version is seriously bad: if I select, say, an A, it displays all cv variants, not just those for A, in a very long list that does not scroll. I understand that this is young software, and perhaps it will improve in future versions. I believe I downloaded a beta of this some time ago and was very unimpressed, but it has come a long way since then.

The "..." button, on the other hand, produces a very nice list of all the variants for A classified helpfully by lookup, with many duplicates removed (you've got to check "Hide irrelevant features" or I believe it shows everything for all languages registered in the font). I forgot to say something about things changing between versions. One reason I haven't formally released this font is that I want to have the italic finished first. But another is that I want to be able to change things when necessary. I'm trying not to do this any more than necessary, but it will happen as long as it is beta.

With Junicode, on the other hand, things are guaranteed not to change except for occasional bug fixes, and the character set in the (non-bold) italic matches that of the roman.

May 26 '21 22:05 psb1558

The Affinity publisher interface for Character Variants has been changed/upgraded a lot in the most recent Windows Beta. I don't think this has even made it into the Mac beta yet. Working on that is why I was looking at JuniusX - to use as a modified font to use to program the new features. But after seeing all the duplicates I instead modified Source Code Pro and sent them that instead. Source Code Pro also already had multiple language support in the basic CVxx names.

Background ... a well known font designer posted a request in the Affinity forum to properly support the extended info for Character Variants as he was adding this to an update of one of his well known font families. I thought he was was going to send them some fonts. Weeks later I asked what was happening with this. Answer: they have no fonts with those features to use to program. So I added those features (cv name, tooltip, sampletext and parameters) to the Source Code Pro font (was going to use JuniusX, but the duplicates), and then I also discovered Charis SIL also has these features, so I sent that too. The Affinity folks used those fonts to re-program the character variants interface. So it now looks like this (in the Windows v1.9.4.1076 Beta): Note: it still needs work to get it right.

Character-Variants-with-Desc+Tip

That is what I wanted to add to JuniusX, but saw all the duplicates. Anyone using JuniusX in Affinity apps would benefit greatly from this.

Any advanced DTP application is going to list all those duplicates. Adobe InDesign, Illustrator, CorelDraw, QuarkEXPress, etc. etc.

I kinda remember us talking about this before, and if the variants are listed from those with the most variants to those with the least variants - it still works. Regardless of the specs - which I also remember. I have not yet done through testing, but ... Yesterday I did modify v1.010, left them in the same order (not as I mentioned above) - and it worked. As I said - have not tested yet in multiple apps, but I will test more later. If it all works, in all apps, my vote would be to remove all the confusing duplicates. That is a lot of visual noise to contend with.

I too like the organized grid, but that is not what users are going to see. In LibreOffice they see absolutely nothing. In advanced apps they see all the duplicates.

Last night I was thinking about this and wondered if the overall approach to grouping the character variants should be different. Some fonts treat the character variants more like additional stylistic sets. In one CVxx you can have multiple replacements, and multiple variants, and contextual chaining (see Iosevka for example). Should the character variants be grouped by languages rather than by alphabet? What if it was instead:

Polish-Old
Polish-Really-Old
Polish-Really-Old-Rare

Does it make more sense to group them together by language, or some other method? Keep in mind that each CVxx can have replacements, variants, and context.

May 26 '21 23:05 kenmcd

If you had sent JuniusX to the Affinity people, they might have found it a trivial matter to program the interface in such a way that it skips over cases where a cvXX feature at a particular index has no effect (heck, even I could do that, and I'm hardly a programmer at all!). Assuming that the font was defective, you didn't send it. But it's not defective. It conforms to the spec—which is what I like to do whenever possible. I confess that I have deviated from the spec on the rare occasion when I haven't been able to figure out another way to get a thing done, but that's not the case here, so any solution has got to conform to the spec.

As of right now, Affinity Publisher is the only app I know of that lets users access the cvXX features via the GUI. InDesign doesn't, and I suspect that it won't ever, since they prefer to provide access to character variants via aalt: Quark Xpress doesn't either, and I doubt that they will in the foreseeable future, given that they aren't even using a font's native small caps yet (ugh!). Don't get me started about MS Word.

Some important apps do provide access to cvXX: LibreOffice, XeTeX/LuaTeX, all the major browsers. They access the variants by index, and what I have done is perfect for users of those apps, since it enables them to get at, e.g., "neckless" variants of a/æÆ/ꜵ/ꜹ with the same index, cv01[5]. I don't know how to provide that kind of convenience without using what are, in effect, sparse arrays.

I really celebrate Affinity Publisher for making these variants accessible via GUI. I would suggest that the easiest solution to this problem is that they make the minor alteration required to take account of cvXX features like those in JuniusX. These features haven't been around very long: if I've done this, I'm sure others will as well.

May 27 '21 03:05 psb1558

On Wed, May 26 2021 at 16:08 -07, kenmcd wrote:

[...]

Should the character variants be grouped by languages rather than by alphabet? What if it was instead:

Polish-Old

Polish-Really-Old

Polish-Really-Old-Rare

Looks very cumbersome for the user...

The question how to present variant in GUI is slightly related to the question how to present them in the font table, e.g. produced by fntsample, cf.

https://github.com/psb1558/Junicode-New/discussions/49

I understand Unicode just ignore the variants in their table.

BTW I think it should be possible to encode variants in plain text, cf.

https://github.com/psb1558/Junicode-New/discussions/44

However I'm afraid I don't understand all the intricaties of the problem :-(

May 27 '21 05:05 jsbien

I would like, if possible, to separate the problem of how best to organize a complex collection of variants in a font from how to present variants visually, whether in a GUI, like the ones @kenmcd and I have been discussing, or in programmatically generated font tables like the ones @jsbien mentions.

I would maintain that it is a positive value to organize variants in a ways that (1) are standards-compliant and (2) group forms (like case-pairs) that belong together on the same index. If that results in many cases where application of a feature has no effect, that's fine, because

It's the job of a programmer working on the visual presentation of a font's features to take account of different (even unusual) ways of organizing variants in a font. Thus the Adobe InDesign GUI shown above "sees" (scare quotes hiding several qualifications) that there are thirteen unique variants of A and displays only those.

And so I'd suggest that the most useful response to a font like JuniusX is "This presents an interesting problem for the designer of a GUI" and not "This font is broken."

I have also been thinking about how to present this material visually, and for the time being have settled on something like this: I didn't generate this programmatically, but I could have, and the pseudo-code relevant to the problem discussed here would be simply

if input-glyph != output-glyph:
   display output-glyph

May 27 '21 12:05 psb1558

I find that spec requirement for cvXX features rather odd and I’m not aware of any implementation that enforce it, so I submitted a feedback issue to Microsoft asking about this: https://github.com/MicrosoftDocs/typography-issues/issues/778

May 28 '21 12:05 khaledhosny

Thanks for looking into this, @khaledhosny. As I know you have more knowledge and experience than most in this area, I wonder what you'd suggest, assuming that MS answers "We actually don't care." The MUFI material is often like the grid for the Aa variants, not like SIL Charis, where every space in the grid is taken. If there is some value in having, say, the uncial variants and the neckless variants collected on the same index, how do you do it?

May 28 '21 13:05 psb1558

That seems like a good case for ssXX feature. I’d keep the cvXX features with no duplicates, and if a set of alternates make a set across different glyphs I’d add a ssXX feature for them (after all that is pretty much the difference between these two groups of features).

May 28 '21 14:05 khaledhosny

Thanks for your thoughts, Khaled.

May 28 '21 16:05 psb1558

As no one seems to think the current arrangement a good one, I have put things back more or less the way they were in Oct. 2020. With some differences: I had added small cap forms to the cvXX entries where they were relevant, and they are still there. Also, I have made sure that when a set is full—that is, when there are lowercase, uppercase, and small cap forms—that set goes first in the sequence.

If MS insists that sets all have to be all the same length, I'll pad the ends of the sets. (Too bad there's not a "None" placeholder. There's a NULL, but it's supposed to delete a character, and in any case it doesn't appear to be implemented.)

When the fonts are ready I'll post them, along with a revised feature reference.

May 28 '21 19:05 psb1558

(Too bad there's not a "None" placeholder. There's a NULL, but it's supposed to delete a character, and in any case it doesn't appear to be implemented.)

I thought about that too. Ignore for alternates display. Take into account for variant number. Your table organization would work, variant numbers would work, and no confusing duplicates. But, not an option.

Hopefully the MS folks will reply there is no need for the stated requirement, so no need for concern about following the standard.

May 28 '21 19:05 kenmcd

Following up on Khaled Hosny’s query (long post—sorry):

In May, Khaled submitted a question to MicrosoftDocs/typography-issues about the rationale for the sentence of the OT spec for the cvXX features that reads, “Within each ‘cvXX’ feature, the number of variants should be identical for all glyphs.” The question went unanswered for some months, and then a few days ago I bumped it, and a good discussion ensued involving Peter Constable of Microsoft and John Hudson of Tiro Typeworks. Unfortunately, Khaled has not weighed in yet.

John Hudson laid out the rationale especially clearly:

The rationale is two-fold: a) the features are intended for variants of individual characters, not for sets of characters (for which the ssxx features would be appropriate), so are expected to be used for e.g. uppercase A and its diacritic forms, and not for both upper- and lowercase characters unless they happen to follow the same pattern of variants; and b) having the same number of variants for each input glyph in the feature allows for the feature to be applied across a body of text with the same enumerated variant producing predictable results on all glyphs affected by the feature.

Of the two rationales, the first seems equivocal (as the spec itself is), but the second is salient. If cv01[1] substitutes lowercase insular a for the font’s default lowercase a, we don’t want cv01[1] to perform quite a different substitution for uppercase A. That has been a concern of mine all along. I can see now that that concern led me to a rather absurd place, but the concern remains a valid one, and it is not addressed by the current version of JuniusX, which does not follow the spec in this respect.

Peter Constable pointed out that the spec reads “should,” not “must” or “shall.” That is, the spec recommends that the number of variants be the same for all glyphs in a cvXX feature but does not require it. That would seem to open the way for a practice like that of JuniusX, but further questioning brought out that the spec does not define the behavior of the layout engine when the recommendation is not followed. The layout engines I know (not many, frankly) do the same thing when a document attempts to apply (for example) cv01[6] to an A when the last variant for A is cv01[5]: it reproduces the default A. But it could just as well reproduce the last variant in the list, or it could drop the letter entirely, and in both cases the behavior of the layout engine would be completely in compliance with the spec. Even if no layout engines are doing these things now, there’s no guarantee that some future engine won’t do them.

It therefore seems best that JuniusX should be in compliance with the spec. It emerged in an exchange with John Hudson that the best course would be to separate uppercase and lowercase letters into different features. Doing so expands the number of features: in the collection of features cv01–cv99, all but the last one are either used or reserved for possible future use.

I have laid out a proposed new arrangement of the cvXX features in a spreadsheet, cv_plan.ods (a LibreOffice document) and an associated .pdf file, both in the docs folder. This should do the work JuniusX wants to do in a standards-compliant way and without the duplication (or padding) that was the reason for the launch of this thread. Comments welcome.

I worry that recent revisions of the JuniusX OT features may be frustrating. I’d like to make a couple of points about that before signing off.

First, I’ve been clear (I hope) that nothing in JuniusX is stable until there’s been a release (which includes, hopefully, a complete matching italic). Indeed, the reason there has been no release since the beginning of this project is so I can change things around when they need changing. People who need this font for a production purpose should use legacy Junicode.

Second, one of my main purposes in creating JuniusX is to provide an accessible method for accessing the wealth of characters in the Medieval Unicode Font Initiative (MUFI) recommendation. By “accessible” I mean standards-compliant, such that all the base characters in any text can be Unicode-standard (not Private Use Area), with variant letter-shapes, archaic ligatures, etc. handled by OpenType features. I am trying to devise a system that can be applied to any MUFI-compliant font so as to enable a high level of accessibility. For this purpose it seems better to rework things fundamentally, if that’s what it takes to create a rational system, rather than apply patches that will just get us by.

Jul 28 '21 23:07 psb1558

Hi Peter, I have been watching the whole discussion over there (glad you bumped it and got a response). As I am also subscribed to the Microsoft typography-issues repo, I see all the discussions there. And this is not the first time I have seen some needed clarification, or sometimes even changes in the specs. With that said I am not as concerned about "following the specs" in this case, because it seems no one is. It appears no app is actually enforcing this, but it was interesting to hear the original thinking on why this was written. My biggest concern was just the clutter and confusion in the user interface.

It therefore seems best that JuniusX should be in compliance with the spec. It emerged in an exchange with John Hudson that the best course would be to separate uppercase and lowercase letters into different features. Doing so expands the number of features: in the collection of features cv01–cv99, all but the last one are either used or reserved for possible future use.

I have laid out a proposed new arrangement of the cvXX features in a spreadsheet, cv_plan.ods (a LibreOffice document) and an associated .pdf file, both in the docs folder. This should do the work JuniusX wants to do in a standards-compliant way and without the duplication (or padding) that was the reason for the launch of this thread. Comments welcome.

I saw the discussion about that as a solution and it makes sense to me. JuniusX is quite unique in having sooooooo many alternates that it has to deal with in a rational manner. I think this is a good solution and like the way it will all end up being organized. Different then the way now, but still nicely organized in a way that users will understand.

Now you have to move it all around ! ! :-) I will review and double-check it for you when it is ready.

Jul 28 '21 23:07 kenmcd

Thank you, @kenmcd. I'm going to let this plan cool for a while, doing some experiments on copies of the source file before I try to implement it.

Fortunately, I enjoy the work, especially when things seem to be moving in the right direction.

Jul 29 '21 00:07 psb1558

Version 1.011, with revised cvNN features, is now up, along with a revised Feature Reference to explain it.

Aug 30 '21 02:08 psb1558

Junicode-font Junicode-font copied to clipboard

Duplicate Alternates in Character Variants

Junicode-font
Junicode-font copied to clipboard