html icon indicating copy to clipboard operation
html copied to clipboard

Canvas TextMetrics additions for editing and text styling

Open schenney-chromium opened this issue 1 year ago • 25 comments

What problem are you trying to solve?

Selection and caret position are two building blocks for editing text in canvas content. Consider the sequence of dragging out a text selection with a mouse or touch, then copying and pasting into a new location. Determining which characters are part of the selection requires mapping a point onto a string, then to a caret position in the text. Drawing the selected region requires the selection area. Inserting again requires mapping a point into a location within a character string. It should be easy for authors to implement editing behavior in canvas.

In addition, we’ve seen increased demand for better text animation and control in canvas. Of particular concern are text strings where the mapping from character positions to rendered characters is complex or not known at the time of authoring due to font localization.

The use cases include:

  • The ability to control how individual graphemes are rendered (over a path or as part of an animation, for example). Consider a case of having text trace the outline of a logo, or letters animating to come together for a word.
  • Manipulation of a glyph’s path (text effects, shaping, etc...). Individual characters may be colored diffferently, or custom shaping maybe needed to integrate into a scene.
  • Native support for i18n and BiDi layout. Users should be able to express advanced artistic/animated text rendered into canvas, in a wide array of fonts and languages, comparable to SVG text support.

What solutions exist today?

The existing TextMetrics APIs give an approximation of the bounding box for a string. This can be used in Javascript to implement the necessary functionality for editing, to a first approximation. Bounds are approximate, however. Furthermore, determining the caret position within a text string corresponding to a hit point requires binary search or similar over the set of strings. i.e am I in the left or right half of the string, recursively requiring log(n) TextMetrics construction and measurement calls. Each of these is relatively expensive.

There is currently way to know which characters in a string correspond to individual glyphs rendered to screen, short of incorporating complete BIDI and font glyph analysis into you app. Trying to lay out characters along a path, or apply per-glyph styling, impossible without knowledge of which characters combine to form which glyphs.

How would you solve it?

Please see the full explainer, including demos, at https://github.com/Igalia/explainers/blob/main/canvas-formatted-text/text-metrics-additions.md

We propose four new functions on the TextMetrics interface:

dictionary TextAnchorPoint {
  DOMString align;
  DOMString baseline;
};

[Exposed=(Window,Worker)]
interface TextCluster {
    attribute double x;
    attribute double y;
    readonly attribute unsigned long begin;
    readonly attribute unsigned long end;
    readonly attribute DOMString align;
    readonly attribute DOMString baseline;
};

[Exposed=(Window,Worker)] interface TextMetrics {
  // ... extended from current TextMetrics.
  
  unsigned long caretPositionFromPoint(double offset);
  
  sequence<DOMRectReadOnly> getSelectionRects(unsigned long start, unsigned long end);
  DOMRectReadOnly getActualBoundingBox(unsigned long start, unsigned long end);

  sequence<TextCluster> getTextClusters(unsigned long start, unsigned long end, optional TextAnchorPoint anchor_point);
};

In addition, a new method on CanvasRenderingContext2D supports filling grapheme clusters:

interface CanvasRenderingContext2D {
    // ... extended from current CanvasRenderingContext2D.

    void fillTextCluster(TextCluster textCluster, double x, double y);
};

The caretPositionFromPoint method returns the character offset for the character at the given offset distance from the start position of the text run (accounting for textAlign and textBaseline) with offset always increasing left to right (so negative offsets are valid). Values to the left or right of the text bounds will return 0 or num_characters depending on the writing direction. The functionality is similar but not identical to document.caretPositionFromPoint. In particular, there is no need to return the element containing the caret and offsets beyond the boundaries of the string are acceptable.

The other functions operate in character ranges and return bounding boxes relative to the text’s origin (i.e., textBaseline/textAlign is taken into account).

getSelectionRects() returns the set of rectangles that the UA would render as the selection background when a particular character range is selected.

getActualBoundingBox() returns the equivalent to TextMetric.actualBoundingBox restricted to the given range. That is, the bounding rectangle for the drawing of that range. Notice that this can be (and usually is) different from the selection rect, as the latter is about the flow and advance of the text. A font that is particularly slanted or whose accents go beyond the flow of text will have a different paint bounding box. For example: if you select this: W you may see that the end of the W is outside the selection highlight, which would be covered by the paint (actual bounding box) area.

getTextClusters() provides the ability to render minimal grapheme clusters (in conjunction with a new method for the canvas rendering context, more on that later). That is, for the character range given as in input, it returns the minimal logical units of text, each of which can be rendered, along with their corresponding positional data. The position is calculated with the original anchor point for the text as reference, while the text_align and text_baseline parameters determine the desired alignment of each cluster.

To render these clusters on the screen, a new method for the rendering context is proposed: fillTextCluster(). It renders the cluster with the text_align and text_baseline stored in the object, ignoring the values set in the context. Additionally, to guarantee that the rendered cluster is accurate with the measured text, the rest of the CanvasTextDrawingStyles must be applied as they were when ctx.measureText() was called, regardless of any changes in these values on the context since. Note that to guarantee that the shaping of each cluster is indeed the same as it was when measured, it's necessary to use the whole string as context when rendering each cluster.

For text_align specifically, the position is calculated in regards of the advance of said grapheme cluster in the text. For example: if the text_align passed to the function is center, for the letter T in the string Test, the position returned will be not exactly be in the middle of the T. This is because the advance is reduced by the kerning between the first two letters, making it less than the width of a T rendered on its own.

Anything else?

A very minimalist editor built on this functionality is at https://blogs.igalia.com/schenney/html/editing-canvas-demo.html

See https://blogs.igalia.com/schenney/canvas-text-editing/ for details on which browser version and flags are required.

schenney-chromium avatar Oct 05 '24 14:10 schenney-chromium

I think TextCluster could be a dictionary, instead of a non-constructible but mutable class. Do you agree, or is there something I'm missing?

domenic avatar Oct 06 '24 05:10 domenic

The TextCluster object must also store internally the context's styles like font and I guess all of the CanvasTextDrawingStyles, as they were when measureText() has been called, so I suppose an interface makes sense?

However it's unclear why it's mutable indeed, nor why the text is exposed and not just stored internally too. IIUC, it's the original text that was passed to measureText, so I suppose authors should already know it.

Also the x and y arguments to fillTextCluster() are a bit unclear to me. I guess they're the equivalent of fillText's x and y, but in that case how come they can be optional?

Kaiido avatar Oct 07 '24 00:10 Kaiido

It seems this intends to cover some of the same use cases as #10650.

cc @whatwg/canvas @khushalsagar

annevk avatar Oct 07 '24 06:10 annevk

I think TextCluster could be a dictionary, instead of a non-constructible but mutable class. Do you agree, or is there something I'm missing?

You're right. We originally thought of as a interface since the underlying object has to save references to other objects and it felt more natural, but having it be a dictionary is more useful from the user's side. Plus, a standard dictionary would allow to create modified copies via the spread syntax. I'll update it.

The TextCluster object must also store internally the context's styles like font and I guess all of the CanvasTextDrawingStyles, as they were when measureText() has been called, so I suppose an interface makes sense?

Indeed, that is the idea. In the current prototype in Chromium (CL is currently under review), the TextCluster object holds a reference to the font in order to replicate the text accurately even if the font set in the context has changed (which is also relevant if the font wasn't fully loaded when measureText() was called). We had originally thought of only the font but I agree that it's better to include all CanvasTextDrawingStyles.

However it's unclear why it's mutable indeed, nor why the text is exposed and not just stored internally too. IIUC, it's the original text that was passed to measureText, so I suppose authors should already know it.

We think it can be useful to allow the creation of TextClusters directly from JS if desired. In that case it would be the author's responsibility to guarantee that no ligatures or glyphs are separated. For that to work the whole text is needed as context to enable the correct shaping of the ligatures or other context-dependent modifications that the font can define. For all other attributes, I agree that we should make them immutable. I will update that too.

Also the x and y arguments to fillTextCluster() are a bit unclear to me. I guess they're the equivalent of fillText's x and y, but in that case how come they can be optional?

The idea for the x and y arguments passed to fillTextCluster() is that by rendering all the clusters returned by getTextClusters() at a specific position, the rendered result is exactly the same as calling fillText() at that same position. In other words, it works as a delta for the internal x and y attributes that are part of the TextCluster object. If they are not passed, the values from TextCluster are used directly (so the equivalent call would be ctx.fillText(text, 0, 0)).

Thanks for your comments!

AndresRPerez12 avatar Oct 07 '24 23:10 AndresRPerez12

We think it can be useful to allow the creation of TextClusters directly from JS if desired.

I fail to see how this could work. As per your previous point, authors would also need to define all of the CanvasTextDrawingStyles inside that object for the engine to make sense of it. You'll then enter the issue of how an author can point to an actual font. As a string? Then when is it parsed? It could be as a FontFace but that would be quite novel since the 2D context doesn't accept this kind of object yet. I might have missed it in the explainer, but I guess it would be clearer to me if you could share an example use-case for the ability to modify a TextCluster, or to create one from scratch without going through measureText().getTextClusters().

(so the equivalent call would be ctx.fillText(text, 0, 0) ).

I think that'd be the first such positioning argument that defaults to 0 in the whole API. That feels odd to me.

Kaiido avatar Oct 08 '24 00:10 Kaiido

Sorry, I'm a bit confused. Dictionaries can't save references to other objects, so if that's indeed needed, then staying with the current interface design makes the most sense.

domenic avatar Oct 08 '24 00:10 domenic

It seems this intends to cover some of the same use cases as #10650.

cc @whatwg/canvas @khushalsagar

Yes, the issues are related in that they came out of the same discussions and prior proposals for improving canvas text. The editing aspects of this proposal could be covered by just inserting HTML content and editing that, but the proposal here is simpler from both an implementation and author perspective.

The access to text cluster information is unique to this proposal and really to a canvas context where the author has direct control of placement of everything.

schenney-chromium avatar Oct 08 '24 14:10 schenney-chromium

I fail to see how this could work. As per your previous point, authors would also need to define all of the CanvasTextDrawingStyles inside that object for the engine to make sense of it. You'll then enter the issue of how an author can point to an actual font. As a string? Then when is it parsed? It could be as a FontFace but that would be quite novel since the 2D context doesn't accept this kind of object yet. I might have missed it in the explainer, but I guess it would be clearer to me if you could share an example use-case for the ability to modify a TextCluster, or to create one from scratch without going through measureText().getTextClusters().

For modification, the main use case we have thought of so far would be to actually draw the cluster at the position x, y passed to fillTextCluster(). This is useful if you want to animate your text as a rotating circle. It's possible to use the x value from each cluster as a way to know where in the circle that cluster should start. But after that, it would be better to be able to call fillTextCluster() with the actual final position for the cluster, and that requires making the x and y values stored in cluster 0. Allowing the modification of these two values is (at least for now) our approach for this use case. The rest of the attributes of TextCluster should be immutable though, I agree.

For the manual creation of TextCluster objects, we thought of it as a potentially interesting feature but if it feels not useful or inconsistent I'm okay with not allowing authors to do that directly, and only supporting clusters originating from getTextClusters(). Our original idea in that case was to use the CanvasTextDrawingStyles from the rendering context if they aren't available from the TextCluster, but now I realize that can even have inconsistencies with changes in the context state. So I agree on that too!

(so the equivalent call would be ctx.fillText(text, 0, 0) ).

I think that'd be the first such positioning argument that defaults to 0 in the whole API. That feels odd to me.

After discussing I realized I mixed this with another default. This default is just a remanent of how my first prototype was implemented. I agree the x and y in fillTextCluster() shouldn't be optional. I will update the explainer accordingly.

AndresRPerez12 avatar Oct 08 '24 21:10 AndresRPerez12

What does TextCluster really represent? Is TextMetrics the right place to clusterize the text? TextMetrics is just metrics for a string. Making TextMetrics.getTextClusters() return sequence<TextCluster> where each TextCluster has DOMString is strange. TextMetrics does not return the string it measures. It does not know the x and y this string will be displayed at. But the returned TextCluster knows its x and y.

I think what you are trying to do is a job of TextAnalyzer, TextItemizer or TextClusterizer more than a job of TextMetrics.

The name of this method is a little bit confusing:

unsigned long caretPositionFromPoint(double offset);

First it does not return a CaretPosition. It just returns an offset in a string. Second it does not take a point like Document.caretPositionFromPoint(). It just takes a distance from the origin of display.

shallawa avatar Oct 08 '24 23:10 shallawa

The name of this method is a little bit confusing:

unsigned long caretPositionFromPoint(double offset);

First it does not return a CaretPosition. It just returns an offset in a string. Second it does not take a point like Document.caretPositionFromPoint(). It just takes a distance from the origin of display.

Yes, I agree there is some potential for confusion in the arguments differing from the DOM version, but there is some discoverability benefit to having the same name. Would caretLocationFromOffset() be an improvement?

schenney-chromium avatar Oct 09 '24 15:10 schenney-chromium

Post WHATWG meeting action items:

  • Rename caretPositionFromPoint to caretLocationFromOffset.
  • Should we provide a locale to get consistent rendering of text in a logo, overwriting the local settings to get a consistent rendering (for company logos, for instance). Look at the SVG solution to this kind of thing.
  • Examine the relationship to Intl.Segmenter.

schenney-chromium avatar Nov 21 '24 17:11 schenney-chromium

I think caretLocationFromOffset is still a bit weird. There's no caret here. And we don't call a string index a location. It's also unclear whether this accounts for grapheme clusters or not. And if it does, perhaps locale needs to be an input here or elsewhere? indexFromOffset() or some such seems more closely aligned with JavaScript.

annevk avatar Nov 22 '24 09:11 annevk

cc @whatwg/i18n

annevk avatar Nov 22 '24 09:11 annevk

I think caretLocationFromOffset is still a bit weird. There's no caret here. And we don't call a string index a location. It's also unclear whether this accounts for grapheme clusters or not. And if it does, perhaps locale needs to be an input here or elsewhere? indexFromOffset() or some such seems more closely aligned with JavaScript.

"index" is exactly what I was thinking after the meeting yesterday.

The behavior here accounts for graphemes the same way that document.caretLocationFromPoint does (generally we use the same underlying code, though not quite in the Chromium case). It's also the code that decides the boundaries of a selection range. It tries to split ligatures but gives the first or last index of a grapheme cluster depending on the bidi direction and whether this point is the start or end of the selection range. For a single point query we take the start of the cluster in the bidi direction.

Don't even get me started on figuring out an index from point in a mixed bidi string. The best logic there is highly context dependent and there is no existing spec language to cover it, nor do I think there should be.

schenney-chromium avatar Nov 22 '24 13:11 schenney-chromium

Regarding compatibility with the TC399 Intl.Segmenter function and associated data.

It definitely seems worth considering the use of JS for segmentation and change the getTextClusters to take a Segments instance, returning x,y,width,height for each segment. The major advantage of this is "free" support for localisation control plus options to control how segmentation occurs. e.g. we could get per-word or per-line segmentation. I can see some value to that. The downside is that the JS segmentation may be the same as the layout/rendering segmentation, making it more complex to convert the indexes to rects, particularly given localisation.

The biggest challenge to this proposal is that the Segment objects in JS exist only in JS; they are not objects created from idl bindings. We manage to handle this for Date and String so it should be possible in theory. Practice is another matter.

schenney-chromium avatar Nov 23 '24 15:11 schenney-chromium

I've compared the results of the Intl.Segmenter segment function and Chromium's implementation of TextMetrics.getTextClusters and they differ in a key respect. The JS method clusters but does not perform visual re-ordering for BIDI, whereas for the canvas case we want the clusters in visual order.

We could have TextMetrics.getTextClusters take a Segments instance and then map it into visual order and compute geometric information. The advantage of that is localization info is present in the clustering. The canvas implementation would need to compute visual order runs and then map the grapheme indices onto those runs, then shape each grapheme for geometric information.

It doesn't seem worth it right now.

schenney-chromium avatar Dec 04 '24 23:12 schenney-chromium

The Explainer has been updated with changes to the text cluster methods. The related IDL now looks like this:

dictionary TextClusterOptions {
  DOMString align;
  DOMString baseline;
  double x;
  double y;
};

[Exposed=(Window,Worker)]
interface TextCluster {
    readonly attribute double x;
    readonly attribute double y;
    readonly attribute unsigned long begin;
    readonly attribute unsigned long end;
    readonly attribute DOMString align;
    readonly attribute DOMString baseline;
};

[Exposed=(Window,Worker)] interface TextMetrics {
  // ... extended from current TextMetrics.
  
  unsigned long getIndexFromOffset(double offset);
  
  sequence<DOMRectReadOnly> getSelectionRects(unsigned long start, unsigned long end);
  DOMRectReadOnly getActualBoundingBox(unsigned long start, unsigned long end);

  sequence<TextCluster> getTextClusters(unsigned long start, unsigned long end, optional TextClusterOptions options);
};
In addition, a new method on CanvasRenderingContext2D supports filling grapheme clusters:

interface CanvasRenderingContext2D {
    // ... extended from current CanvasRenderingContext2D.

    void fillTextCluster(TextCluster textCluster, double x, double y, optional TextClusterOptions options);
};

The options to getTextClusters give parameters to use when computing the cluster positions. The same options to fillTextCluster override the values used when the set of clusters was generated with getTextClusters.

There is an example using these options to render text on a circle.

Note also the rename for getIndexFromOffset.

schenney-chromium avatar Dec 11 '24 22:12 schenney-chromium

Regarding localization. The<canvas> element should already be respecting the lang attribute to control the fonts used. The spec should be updated if it doesn't already say that, and the lang should determine everything about fonts for all canvas text rendering including metrics. Using that attribute to control the locale for the methods proposed here would then match how they were subsequently rendered. It's what I would expect.

Control of localization in JS calls seems to me to be more targeted toward non-bindings situations, like node.js.

This html confirms that, in Chrome at least, the lang attribute on the canvas influences the text metrics:

<!DOCTYPE html>
<html>
  <body>
    <canvas id="en" width="300px" height="300px" lang="en"></canvas>
    <canvas id="bg" width="300px" height="300px" lang="bg"></canvas>
    <script>
      let en_context = document.getElementById("en").getContext("2d");
      let bg_context = document.getElementById("bg").getContext("2d");

      function drawText(context) {
        context.font = "20px Commissioner";
        let text = "абвгд";
        context.color = "black";
        context.fillText(text, 50, 50);
        let metrics = context.measureText(text);
        console.log(metrics.width);
      }

      let myFont = new FontFace(
        "Commissioner",
        "url(https://fonts.gstatic.com/s/commissioner/v20/tDbw2o2WnlgI0FNDgduEk4jAhwgumbU1SVfU5BD8OuRL8OstC6KOhgvBYWSFJ-Mgdrgiju6fF8m0bkXaexs.woff2)"
      );

      myFont.load().then((font) => {
        document.fonts.add(font);
        drawText(en_context);
        drawText(bg_context);
      });
    </script>
  </body>
</html>

schenney-chromium avatar Dec 11 '24 22:12 schenney-chromium

The lang attribute doesn't work for OffscreenCanvas though. And if it's an actual input we should make that explicit.

annevk avatar Dec 12 '24 08:12 annevk

The options to getTextClusters give parameters to use when computing the cluster positions. The same options to fillTextCluster override the values used when the set of clusters was generated with getTextClusters.

There is an example using these options to render text on a circle.

A bit more context on the options parameter for fillTextCluster():

This came about as an answer to concerns expressed here about why weren't all the attributes of a TextCluster object readonly. Our reasoning for keeping the x and y attributes mutable before was enabling cases like in the example above, in which the positional data of the clusters is used to make some calculation (in this case, where on the circle to place each cluster based on its x value compared to the whole width), but then render the cluster at a specific location based on that calculation.

We agree that it's better to have all attributes as readonly, so we came up with this options parameter for fillTextCluster() that can override these values, as well as the align and baseline, that are used to render the cluster to provide this flexibility without having to modify the cluster at all. This way, the options for selecting what position is desired when measuring the text and obtaining the cluster can be independent of the options used to render if desired.

Please let us know what you think!

AndresRPerez12 avatar Dec 12 '24 15:12 AndresRPerez12

The lang attribute doesn't work for OffscreenCanvas though. And if it's an actual input we should make that explicit.

Offscreen canvas should be fixed to accept lang, because it also renders text and one would not expect the offscreen canvas font choices to differ from the canvas you are putting the offscreen content into. That would be adding an optional locale to the constructor for OffscreenCanvas. If the offscreen is created from a canvas element it would get the canvas element's locale.

I suppose that's a separate issue.

schenney-chromium avatar Dec 12 '24 16:12 schenney-chromium

Created https://github.com/whatwg/html/issues/10862 for adding a lang parameter for offscreen canvas. Right now it always users the OS locale, it seems.

schenney-chromium avatar Dec 13 '24 02:12 schenney-chromium

I still have a problem with this proposal for two reasons:

  1. You are adding to a TextAnalyzer to TextMetrics which is tied to the selected font. TextClusters are more than just splitting the text. I think it deserves its own new interface.
  2. Analyzing text and using the output of this analysis has to go through multiple steps not just calling TextMetrics.getTextClusters(). Theses steps are:
  • Analysis: Text will be split to something like TextGroup. All characters in a TextGroup have the same language and the same bi-directional level. This step does not need any font. It just depends on the code points of the text.
  • Shaping: each TextGroup can be shaped in one step using the current selected font. The result of this step are (1) a set of TextClusters, (2) glyph points in the font and (3) positions for the glyphs. One glyph can map to one or more one character for example multiple Arabic characters can be displayed by one glyph. And in some Indic languages a character can be displayed by more than one glyph. A TextCluster is the smallest displayable unit of TextGroup. It maps one or more characters from the TextGroup to one or more glyphs. For example a TextCluster can map one character to multiple glyphs or map multiple characters to one glyph. Shaping also returns the glyph positions assuming the text is displayed horizontally at x = 0. In this step also, the font substitution will also be applied. For example a TextCluster may be split because not all the glyphs in it can be displayed by the selected font. And a font substitute has to be used instead.
  • Justification: This step is optional step where glyph positions are shifted to expand or shrink to a certain width. In Arabic, the glyphs will be expanded by Kashidas. In other languages, inter character and inter word justification can be applied by spaces.
  • Usage: In this case,TextGroups can be displayed, measured or hit-tested. A web developer should not have to loop through TextClusters, glyphs or characters to map from a display offset to a character index.

To summarize if you want to add the text analysis to the canvas, please choose where the interface should be carefully and do it in full. Text analysis is a complex subject. It is more than just splitting and measuring the text.

shallawa avatar Apr 08 '25 22:04 shallawa

I hear the concerns about the text analysis implied by this proposal. As is current;y proposed there is an implicit step to convert to TextGroup aka text runs, then each is shaped, then the shaping results are exposed. And all of those steps are non-trivial and not even consistently done between browsers in some cases. But what is the alternative for achieving the desired enhancements for authors?

In particular, TextGroups as proposed are a different level of granularity that in most cases would be exactly the same as the input string. Mixed bidi runs are not common. So making TextGroups the unit for display etc does not add anything most of the time.

To achieve per-glyph positioning and shading it is necessary to provide per-glyph data in some form. There are other ways to go about this such as text-on-path and some form of styling per-glyph, but they are less general than this proposal. Is that what you would propose to address the use cases?

Regarding things like indexFromPosition, these are based on existing DOM APIs that already handle all the bidi re-ordering and shaping implicitly. It's even possible to write JS to produce the same output as getTextClusters() does, though with a lot more script and computation. That is, use DOM APIs for offsetFromPoint to search for the position transition points, then use the substrings implied by that analysis as the text clusters. This proposal is exposing the same thing without all the JS.

schenney-chromium avatar May 06 '25 13:05 schenney-chromium

@shallawa Would you be willing to schedule a half hour to talk about the API with a goal of converging to a solution?

schenney-chromium avatar Jun 12 '25 00:06 schenney-chromium