medley icon indicating copy to clipboard operation
medley copied to clipboard

Make back arrow character "←" different from underscore character "_"

Open MattHeffron opened this issue 1 year ago • 5 comments

The representation of character code 0x5F as "←" instead of "_" dates back to the 1963 ASCII standard, and Interlisp/Medley preserved that interpretation for backward compatibility with earlier systems which had Lisp implementations (e.g. DEC-10). (This seems not to match the XCCS encoding; as that, according to the Wikipedia XCCS article, has character code 0x005F as "_", with "←" at character code 0x00AC. Standard Medley fonts seem to have an encoding for character set 0 that differs from XCCS.)

Medley is a bit schizophrenic in how it treats character code 0x5F. In general, it is a character that can be used to construct LITATOMs. However, in CLISP constructs in the Interlisp world it may also be interpreted as an operator character. Simply changing the glyph for that character code to "_" would render the older CLISP code a bit less readable. (This might be acceptable!) Leaving it as "←" makes Common Lisp code look odd, and it can frustrate new users as that glyph isn't on modern keyboards. (This would keep all Interlisp/Medley documentation and publications correct.)

The modernization of Medley for modern keyboards and support for Common Lisp that use the Unicode glyph encoding suggests splitting these into two independent first class characters. There seem to be a few strategies for this:

  1. Changeover Medley to be fully Unicode based.
  2. Change character code 0x005F to "_" (matches Unicode, XCCS, current ASCII), set "←" at character code 0x00AC (XCCS, not Unicode, but it leaves "←" in character set 0).
  3. Change character code 0x005F to "_" (matches Unicode, XCCS, current ASCII), set "←" at character code 0x2190 (Unicode, not XCCS, and it moves "←" to character set 0x21 = 041).

The above 3 strategies (and any others that I didn't think of) would require:

  • Adding support for a "←" character in keyboard mapping that is different from "_".
  • File conversion utilities that somehow leave an indication that the file has been "converted".
    • These utilities likely would need to be interactive, as many cases would be ambiguous as to the intended character.
    • Tedit and Sketch (and other) files likely would use different heuristics for conversion, vs. for code files.
  • Revising all CLISP/DWIM code that interprets character code 0x5F as "←", to instead use the new character code for "←" functionality. (This may be a bit of chicken-and-egg issue with implementing the conversion utilities.)
  • Modifying all character set 0 font files to add the "_" glyph and update the WIDTHS, OFFSETS, and IMAGEWIDTHS information.
    • If there are fonts that have "_" but not "←" then the corresponding modification would be required.
  • Revise PostScript/PDF printing to handle the changed character codes appropriately.

#1 clearly would be a huge effort, but also would be most desirable (for the full Unicode support). #2 is comparatively simplest of the three, (but only because it leaves "←" in character set 0). As mentioned above, simply changing the glyph for that character code to "_" might be acceptable, and would be the simplest. All that would be required would be updating of fonts and PostScript/PDF printing. (Interpress and Press could be modified for completeness, but seem to be less useful. IMHO.)

MattHeffron avatar Oct 14 '24 20:10 MattHeffron

As far as I know, the XCCS (NS) fonts all have the "_" glyph where the Alto/Press fonts have "←", so it depends on which font set you're using: Screenshot 2024-10-14 at 2 09 14 PM

nbriggs avatar Oct 14 '24 21:10 nbriggs

I like the idea of changing what we mean by "XCCS" in an external format, to define the code rewrites so that "_" is left arrow and "^" is up-arrow. This is nominally an incompatible change but I think it would be better. We'd have to change the NS fonts to swap the glyphs.

masinter avatar Oct 14 '24 21:10 masinter

In particular

Adding support for a "←" character in keyboard mapping that is different from "_".

Not needed. There already is support. The keyboard when you type a "_" gives you the old-tty-character which prints as a left arrow in Medley. There's another XNS character for underscore and circumflex that don't ordinarily have keyboard assignments.

This should follow, or be part of, the work on https://github.com/Interlisp/medley/issues/58 File conversion utilities that somehow leave an indication that the file has been "converted".

Not really needed. Pretty much all medley sources can be used without conversion.

These utilities likely would need to be interactive, as many cases would be ambiguous as to the intended character. Tedit and Sketch (and other) files likely would use different heuristics for conversion, vs. for code files.

not needed

Revising all CLISP/DWIM code that interprets character code 0x5F as "←", to instead use the new character code for "←" functionality. (This may be a bit of chicken-and-egg issue with implementing the conversion utilities.)

not needed though not a bad idea

Modifying all character set 0 font files to add the "_" glyph and update the WIDTHS, OFFSETS, and IMAGEWIDTHS information.

just modify the NS fonts, leave alto fonts like GACHA and HELVETICA etc alone

If there are fonts that have "_" but not "←" then the corresponding modification would be required.

not sure there are any

Revise PostScript/PDF printing to handle the changed character codes appropriately.

This is a matter of undoing the patch to substitute

masinter avatar Oct 14 '24 21:10 masinter

To clarify: my preference would be that character code 0x005f be the "_" glyph, and some other character code be the "←" glyph. This is what new users would expect going forward; especially Common Lisp users. This is why file conversion utilities, and changing the CLISP/DWIM code seem to be necessary. This is also why

Adding support for a "←" character in keyboard mapping that is different from "_".

would be necessary; so that there is still some way to type the "←" character. (I suspect that keyboards with a "←" key are rare, or non-existent.) Once the character code is chosen, then it may be simply putting that into the KEYACTION table(s) on some key. But that may need the keyboard consistency changes to enable recognition of (at least one of the) "common" keys that currently are ignored (e.g., F10).

Re: changes to fonts, I looked only at HELVETICA's font character bitmap. So from @nbriggs comment:

As far as I know, the XCCS (NS) fonts all have the "_" glyph where the Alto/Press fonts have "←"

it appears that NS fonts already show 0x005f as the "_" glyph. (Do they have the "←" glyph?) I didn't remember that there were ≥ two different font sets. In my head everything was XCCS, as that's the internal character encoding.

MattHeffron avatar Oct 17 '24 01:10 MattHeffron

I just wanted to mention in passing that MIT's ITS and Stanford's WAITS operating systems have the same ambivalence about the "_" or "←" character (and also "^"/"↑"). They both got started on PDP-6 computers (the immediate precursor to the PDP-10 and DEC-10, DEC-20) in the mid-60s before ASCII finalized on _ and ^.

larsbrinkhoff avatar Oct 17 '24 05:10 larsbrinkhoff

Tedit uses the translation tables in INTERPRESS if asked to coerce, say, Helvetica to Modern. Right now, there are 2 different tables for that, \ASCIITONS and \ASCIITOSTAR. The differences is that \ASCIITONS has a mapping of neutral hyphen 0,55 to 41,76 (hyphen in the Japanese punctuation charset) that \ASCIITOSTAR doesn't have.

There is a comment in the code that says that the \ASCIITONS (with the hyphen translation) is XCSS-1-1-1 and that the Star version is for XCSS-2... with a comment "soon to come".

Is there any reason to maintain the NS hyphen mapping, or just let hyphens always be hyphens. Then at least we can focus on the underscore/left-arrow, caret/uparrow, and dollar-sign/currency issues.

rmkaplan avatar Dec 20 '24 19:12 rmkaplan

I would not translate the ASCII hyphen-minus to the Japanese punctuation charset. Published XCCS revisions evolved to, if I remember, XCCS 3-*, though I don't know if there was ever an implementation (either in fonts, or programs that expected that revision). I suspect that the XCCS based fonts we have are really all XCCS 1-1-... One might, possibly, be able to find Star/Globalview fonts out on the web somewhere as part of a Globalview installation package.

nbriggs avatar Dec 20 '24 20:12 nbriggs

OK, I will remove that hyphen mapping, get rid of the STAR vs NS distinction. Next, try to get dollar signs to work consistently. Our NS display fonts (Classic etc.) do have the glyph for $ at Ascii $, maybe only Interpress hardcopy needs to have that mapping. Not yet sure what postscript is doing.

On Dec 20, 2024, at 12:27 PM, Nick Briggs @.***> wrote:

I would not translate the ASCII hyphen-minus to the Japanese punctuation charset. Published XCCS revisions evolved to, if I remember, XCCS 3-*, though I don't know if there was ever an implementation (either in fonts, or programs that expected that revision). I suspect that the XCCS based fonts we have are really all XCCS 1-1-... One might, possibly, be able to find Star/Globalview fonts out on the web somewhere as part of a Globalview installation package.

— Reply to this email directly, view it on GitHub https://github.com/Interlisp/medley/issues/1854#issuecomment-2557674679, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQSTUJIIQZWX3MXA6NLHSY32GR4S5AVCNFSM6AAAAABP5XCQBWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNJXGY3TINRXHE. You are receiving this because you commented.

rmkaplan avatar Dec 20 '24 20:12 rmkaplan

Not yet sure what postscript is doing.

PostScript's standard encoding vector has most ASCII glyphs where expected. $ glyph is at 0x24. POSTSCRIPTSTREAM assumes any character codes 0x20-0x7E are the corresponding ASCII glyphs. There is *POSTSCRIPT-NS-TRANSLATIONS* (and *POSTSCRIPT-NS-HASH*) to map NS characters that require a font change, or aren't in the standard encoding vector, or must be constructed (e.g., "0,274" as "¼"). This is used in \POSTSCRIPT.OUTCHARFN to render the actual characters. I think that I had a simplified version of this in the original POSTSCRIPTSTREAM and John Sybalsky extended it. (And someone with initials "rmk" touched it in 1993. 😉)

MattHeffron avatar Dec 21 '24 00:12 MattHeffron

I confirmed that postscript maps the $ 0x24 to the dollar-sign glyph--$ in a Tedit file shows up as $ in pdf.

rmkaplan avatar Dec 21 '24 02:12 rmkaplan

I'll first note that there is an earlier related discussion in #1403.

Here's a proposal for moving forward a little bit, at least having a tool for piercing some of the confusion. As has been suggested in both of these discussions, we get bolloxed up because the character encoding that we use internally is mostly XCCS, but differs in a few important details. So when we say we are mapping characters in the Ascii (nee Alto) fonts Helvetica, Gacha... into XCCS/NS codes using the tables in INTERPRESS, that's not entirely correct.

So, let's start by saying that we are actually running with internal codes that are in our own Medley Character Encoding Standard (MCCS). MCCS differs from XCCS in that 0,44 is the dollar sign and not the currency symbol, hyphen is hyphen. The table \ASCII2MCCS maps codes in Helvetica to codes in MCCS that have the same internal meaning as is associated with the Helvetica glyphs. The separate table \ASCII2XCCS would map to XCCS codes that would cause Interpress printing to produce the proper glyphs--that would be part of Interpress hardcopy--swappping codes and metrics maybe on the fly.

As Matt pointed out, Postscript already assumes that it is operating with MCCS codes. And even our NS display (but not Interpress) fonts have the dollar-sign glyph assigned to the dollar-sign code.

If MCCS is our basic scheme, we have a place to implement whatever meanings/glyphs decisions we make about the assignments of the Ascii caret and underscore codes.

If we want to propagate the Ascii/Alto assignments, then we would change the keyboard mappings for the caret and underscore keys to produce the arrow codes, and come up with some other reasonable way of typing caret and underscore (meta-caret, meta-underscore?). And then move the glyphs around in our current NS fonts (which could be done either when fonts are loaded or by reworking the files).

Or, let typing caret and underscore produce the caret and underscore codes (associated with the NS font glyphs but not the Ascii font glyphs--fix the Ascii fonts), and come up with a reasonable way of typing the up and left arrows (use meta-caret and meta underscore for those?).

There are very few existing source files that contain uparrow/left arrow. We think of those files as being in XCCS, but in fact they are MCCS--we should read them as if that was their external format.

rmkaplan avatar Dec 22 '24 01:12 rmkaplan

I think we might need to add a little more complexity to avoid problems with backward compatibility. In particular I think we can avoid modifying the fonts we have by, say, adding a .FONTINFO file or some other container for extra metadata that indicates how the font will have to be transformed when using it with XCCS or MCCS. So rather than trying to transform all the fonts to fit into a narrow model, you just say which transformations are needed for a particular (set of) font(s).

I think we need some better Lisp diagnostics to discover more about the fonts we do have -- which glyphs are missing (some fonts miss back-tick, their use o $ dollar-sign, left arrow underbar, ^ (which I thought was a circumflex rather than a caret) and up-arrow.

I think also for backward compatibility that we should keep :XCCS as the name of the external format that you referred to as MCCS. If we want to name the official XCCS we could call it :XCCS-XEROX or something like that.

I don't think we need to change keyboard mappings. Typing shift-6 on a US keyboard should give you the code for ^ up-arrow/circumflex as it does now.

masinter avatar Dec 22 '24 22:12 masinter

I don't particularly want to change the keyboard mappings to codes - for either key (0x5E and 0x5F). I prefer the Alto fonts for programs, and the only time I would use NS fonts is for documents -- where the TEDIT.ABBREVS are already set up so you can ctrl-X to expand ^ to an up-arrow and _ to a left arrow in the NS font.

nbriggs avatar Dec 22 '24 23:12 nbriggs

Among other issues: Timesroman and Helvetica agree on their glyph assignments, including some random ones up above 128. The NS mapping table that we have been using doesn’t have entries for those. They also have ligatures of various sorts distributed among the control codes.

But Gacha is different, where i has glyphs, it doesn’t have glyphs below Space, and size 14 (but not 10 or 12) even has some reasonable glyphs above 128.

EDITFONT shows you what’s there (although it doesn’t make it easy to see what the code numbers are).

So we need to extend our tables for Helvetica and Timesroman, no matter what we do about the keyboard or file interpretations.

On Dec 22, 2024, at 3:07 PM, Nick Briggs @.***> wrote:

I don't particularly want to change the keyboard mappings to codes - for either key (0x5E and 0x5F). I prefer the Alto fonts for programs, and the only time I would use NS fonts is for documents -- where the TEDIT.ABBREVS are already set up so you can ctrl-X to expand ^ to an up-arrow and _ to a left arrow in the NS font.

— Reply to this email directly, view it on GitHub https://github.com/Interlisp/medley/issues/1854#issuecomment-2558639385, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQSTUJIO2DWNKS22R327KRT2G5A3RAVCNFSM6AAAAABP5XCQBWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNJYGYZTSMZYGU. You are receiving this because you commented.

rmkaplan avatar Dec 23 '24 00:12 rmkaplan

Try

(FILESLOAD FONTSAMPLE)
(FNT.DISPTBL (SETQ FW (FNT.MAKEWIND))) (FONTCREATE 'CLASSIC 14))

to get a table that's arranged in rows of 16 characters.

nbriggs avatar Dec 23 '24 01:12 nbriggs

GACHA 12 has the underscore at 30Q, but neither 10 nor 14 do.

nbriggs avatar Dec 23 '24 01:12 nbriggs

Addressed in #2280

rmkaplan avatar Oct 27 '25 07:10 rmkaplan