fontbakery icon indicating copy to clipboard operation
fontbakery copied to clipboard

[com.google.fonts/check/soft_hyphen] improve rationale

Open RosaWagner opened this issue 2 years ago • 19 comments

Observed behaviour

⚠ WARN: Does the font contain a soft hyphen? (com.google.fonts/check/soft_hyphen) ⚠ WARN This font has a 'Soft Hyphen' character. [code: softhyphen]

Expected behaviour

The check doesn't say what should be done with the soft hyphen. I would propose:

"The softhyphen is sometimes designed empty with no width (such as a control character), sometimes the same as the traditional hyphen, sometimes double encoded with the hyphen. That being said, it is recommended to not include it in the font at all, because discretionary hyphenation should be handled at the level of the shaping engine, not the font. Also, even if present, the software would no display that character. More discussion here https://typedrawers.com/discussion/2046/special-dash-things-softhyphen-horizontalbar and here https://github.com/googlefonts/fontbakery/issues/3486"

cc @vv-monsalve @twardoch thumbs up if you agree

RosaWagner avatar Mar 16 '23 13:03 RosaWagner

Also, is it your intent to always report it as a WARN if it simply exists? Seems to me that the report should be:

  1. INFO: if u00AD is present and is at zero width with no outlines, as expected
  2. WARN: if u00AD is present but there is an outline or it is double encoded to u002D (hyphen)
  3. FAIL: if not present

glenda-tn avatar Mar 16 '23 16:03 glenda-tn

@RosaWagner and others, what do you think aobut what @glenda-tn just said?

felipesanches avatar Mar 16 '23 16:03 felipesanches

@felipesanches maybe there was a confusion from my part between rationale/message; I would have liked to make the message about what to do when rationale are not displayed in report (such as in GF repo)

⚠ WARN This font has a 'Soft Hyphen' character. [code: softhyphen] Please consider removing it from the font.

@glenda-tn, we don't want soft hyphen to be present. If the soft hyphen is not present, then it is a PASS.

RosaWagner avatar Mar 16 '23 16:03 RosaWagner

@glenda-tn, are you OK with the approach that Rosalie proposed in her last message?

felipesanches avatar Mar 16 '23 20:03 felipesanches

oooh! I didn't realize Google actually wants the char to be removed. At TN, we are following the Unicode definition as I had posted in #4046 and repeat here:

Unlike U+2010 HYPHEN, which always has a visible rendition, the character U+00AD SOFT HYPHEN (SHY) is an invisible format character that merely indicates a preferred intraword line break position. If the line is broken at that point, then whatever mechanism is appropriate for intraword line breaks should be invoked, just as if the line break had been triggered by another hyphenation mechanism, such as a dictionary lookup. Depending on the language and the word, that may produce different visible results…

At Type Network, a year ago we actually had to add uni00AD to a typeface for a client because:

Due to problems with hyphenation in printing (Office 365) documents a developer found out that the [TypeNetwork fonts] do not support soft hyphen characters ((unicode: \u00AD)

I apologize, I had misunderstood your intent with the check. Now I get that you don't actually want it in the font. That said, then your rationale is fine and clarifying. At TN, we will continue to require it, but at 0 width, with no outlines.

Thanks for your attention to this.

glenda-tn avatar Mar 16 '23 21:03 glenda-tn

@glenda-tn, in this particular case, we're discussing a check that is currently in the universal profile, so I am trying to reach consensus on what is a good policy for everybody, instead of a Google Fonts-specific requirement.

It would then be great if we can figure out what criteria would better satisfy the needs of the majority of the font development community as a whole. Those who have diverging requirements can still have their vendor specific checks for that. But the goal here is to decide on what to do on the universal profile. What do you think?

felipesanches avatar Mar 16 '23 21:03 felipesanches

@felipesanches To keep the check Universal, I would think that following the Unicode spec is the correct thing to do and use the INFO, WARN, FAIL results that I posted above. The original rationale was fine though I might cross out the last sentence.

This font has a 'Soft Hyphen' character (codepoint 0x00AD) which is supposed to be zero-width and invisible, and is used to mark a hyphenation possibility within a word in the absence of or overriding dictionary hyphenation. ~~It is mostly an obsolete mechanism now, and the character is only included in fonts for legacy codepage coverage. [code: softhyphen]~~

To have more clout, I might add the quote from Unicode page to the above. (quote copy is in my previous comment)

glenda-tn avatar Mar 17 '23 00:03 glenda-tn

Another thing to keep in mind is the fact that the checks in the universal profile cover things that go beyond what the OpenType spec dictates.

So, we have checks in the opentype profile for things that the spec requires or recommends, and then the universal profile is for additional aspects that are not covered by the spec but that are generally established as good practices in the type design community.

felipesanches avatar Mar 17 '23 01:03 felipesanches

I once made a font which had a glyph associated with the soft hyphen Unicode. The glyph looked like hyphen but had a negative sidebearing so that the hyphen would stick out a bit at the hyphenation point.

I observed that most implementations ignored that glyph and displayed the actual hyphen glyph at the end of the line, but at least one app* did display the glyph associated with the soft hyphen Unicode — which is what I actually wanted. But I did not test what would happen if the font did not include a glyph there. But my conclusion was that it was safer if the soft hyphen Unicode is supported by the font cmap — be it as a separate glyph or as an additional mapping to the hyphen glyph.

*) Unfortunately I longer remember which app it was

twardoch avatar Mar 17 '23 02:03 twardoch

@twardoch when we chatted about it 18 month ago you recommended to remove the glyph completely ^^

@felipesanches taken all comments into account I would go back to previous rationale with the proper quote from the unicode provided by @glenda-tn. I am still bothered my the WARN message though that doesn't give any clear recommendation.

I just tested the soft hyphen in Office 365 + Indesign + native apps from both windows and mac, with fonts having the soft hyphen and not, and it seems to work fine it all cases. Printing the documents directly from the app was also working, and from a PDF too. We also did these tests when implementing that check the first time.

@glenda-tn do you happen to know which version of Office 365, the OS and kind of printer your client was using?

For now to improve the check, I would propose:

  • PASS if soft hyphen is absent
  • PASS if soft hyphen is present with 0-width and no outline
  • FAIL if softhyphen is present with contour with suggestion to either remove it for recent environments or leave it but invisible for backward compatibility.

RosaWagner avatar Mar 17 '23 12:03 RosaWagner

Great! It sounds like we're reaching consensus on this for the universal profile then!

I'll incorporate these proposed changes. Thanks!

felipesanches avatar Mar 17 '23 14:03 felipesanches

@twardoch when we chatted about it 18 month ago you recommended to remove the glyph completely ^^

I see :)

twardoch avatar Mar 17 '23 16:03 twardoch

@RosaWagner I do not know anything further about the client usage... this was over a year ago and we simply made it a point that going forward, we would require the code point.

I agree with @twardoch's earlier comment upthread in that the code point, 00AD, should be present in fonts. There are likely many existing documents that use it. While some apps/browsers may ignore it, there are probably some out there that don't and so there is risk of the .notdef showing up if the code point is missing from a font.

Another point to note is that the softhyphen is used in other languages and connected scripts, like Arabic. From the same Unicode link I've been referring to but written further down the page is this:

Hyphenation, and therefore the SHY, can be used with the Arabic script. If the rendering system breaks at that point, the display—including shaping—should be what is appropriate for the given language. For example, sometimes a hyphen-like mark is placed on the end of the line. This mark looks like a kashida, but is not connected to the letter preceding it. Instead, the appearance of the mark is as if it had been placed—and the line divided—after the contextual shapes for the line have been determined. For more information on shaping, see [UAX9] and Section 9.2, Arabic, of [Unicode].

That said, I do not think the absent code point should be a PASS, but rather, a FAIL.

glenda-tn avatar Mar 17 '23 16:03 glenda-tn

  • PASS if soft hyphen is absent
  • PASS if soft hyphen is present with 0-width and no outline
  • FAIL if softhyphen is present with contour with suggestion to either remove it for recent environments or leave it but invisible for backward compatibility.

From our previous tests and the new ones Rosalie has performed, I would support this.

However, since we are discussing a Universal profile check, I would like to hear from @tiroj, who provided the first argument we consider for the previous issue.

vv-monsalve avatar Mar 18 '23 02:03 vv-monsalve

I think I would be inclined to go with

  • WARN or INFO if soft hyphen is absent
  • PASS if soft hyphen is present with 0-width and no outline
  • FAIL if softhyphen is present with contour with suggestion to either remove it for recent environments or leave it but invisible for backward compatibility

While soft hyphen not being present is likely to be fine in most instances, there are edge cases where it could be either desired or recommended for accurate Windows codepage coverage.

There are a lot of fonts in the wild that have visible soft-hyphen glyphs, or that dual-map the /hyphen glyph to U+00AD, since many people are confused about the purpose of the character. I made a few myself before Khaled set me right a few years ago.

tiroj avatar Mar 18 '23 03:03 tiroj

There are a lot of fonts in the wild that have visible soft-hyphen glyphs, or that dual-map the /hyphen glyph to U+00AD, since many people are confused about the purpose of the character.

This. Noto, Roboto and RobotoFlex, Segoe, SourceSerif, SF Symbols, and so many Google fonts all have a visible contour or are dble-encoded. I have looked at so many. I only know of SF Pro that completely omits the code point.

I would be fine with @tiroj's rationale for the Universal profile.

glenda-tn avatar Mar 18 '23 04:03 glenda-tn

Another point to note is that the softhyphen is used in other languages and connected scripts, like Arabic. From the same Unicode link I've been referring to but written further down the page is this:

Hyphenation, and therefore the SHY, can be used with the Arabic script. If the rendering system breaks at that point, the display—including shaping—should be what is appropriate for the given language. For example, sometimes a hyphen-like mark is placed on the end of the line.

This is true, in theory: the Uyghur language, at least, uses the Arabic script and supports hyphenation. But I am not sure that it is true in practice. I don't think that any layout system correctly implements hyphenation for Arabic. (Apart from my own SILE typesetter. ;-)

simoncozens avatar Mar 23 '23 09:03 simoncozens

Taking everything into consideration, the check could use the following:

Log Level Result

  • INFO if soft hyphen is absent
  • PASS if soft hyphen is present with 0-width and no outline
  • FAIL if there is a dual-map the for the /hyphen glyph to U+00AD
  • FAIL if softhyphen is present with contour with suggestion to either remove it for recent environments or leave it but invisible for backward compatibility.

Rationale

According to Unicode

Unlike U+2010 HYPHEN, which always has a visible rendition, the character U+00AD SOFT HYPHEN (SHY) is an invisible format character that merely indicates a preferred intraword line break position.

Nevertheless, it is recommended not to include it in the font at all, because discretionary hyphenation should be handled at the level of the shaping engine, not the font. If in need to add it for any backward compatibility support, 00AD should be 0-width and have no outline.

vv-monsalve avatar Mar 23 '23 18:03 vv-monsalve

After reading through this conversation, I agree with the statuses from @vv-monsalve’s latest comment. However, I think the rationale would need to be adjusted slightly further, to conform with the statuses. As Viviana wrote it above, the rationale emphasizes a recommendation to remove the glyph from fonts, which seems to conflict with the statuses.

Here’s my attempt to improve this, along messages for each status:

Log level result

  • PASS if soft hyphen is present with zero width and no outline
    • Message: "The font contains a soft hyphen with zero width and no outline."
  • INFO if soft hyphen is absent.
    • Message: "It has been reported that the soft hyphen (U+00AD) is required for accurate Windows codepage coverage and proper handling in some printers. You may wish to add this glyph. Note: the soft hyphen glyph should have zero width, and no outline."
  • FAIL if there is a dual-map the for the /hyphen glyph to U+00AD
    • Message: "Unicode defines the soft hyphen as 'an invisible format character that merely indicates a preferred intraword line break position'. The soft hyphen should not be dual-mapped to the hyphen glyph, as the hyphen always has a visible rendition. It is valid to exclude the soft hyphen glyph. However, for the best Windows backwards compatibility, the soft hyphen glyph should have zero width, and no outline."
  • FAIL if softhyphen is present with contour
    • Message: [same as other FAIL]

Rationale

According to Unicode:

Unlike U+2010 HYPHEN, which always has a visible rendition, the character U+00AD SOFT HYPHEN (SHY) is an invisible format character that merely indicates a preferred intraword line break position.

It is recommended by Google Fonts to not include a soft hyphen glyph in the font at all, because discretionary hyphenation should be handled at the level of the shaping engine, not the font.

If in need to add it for any backward compatibility support, U+00AD should be 0-width and have no outline.

See discussion at https://github.com/fonttools/fontbakery/issues/4095 See also https://www.unicode.org/reports/tr14/tr14-49.html#SoftHyphen

arrowtype avatar Jan 30 '25 21:01 arrowtype