wordcloud2.js fix #157

I’ve had a go at fixing the element layout issue (#157) and think I’ve made significant improvements. I’ve basically switched to using text baseline as the basis for positioning all text (canvas and HTML). To do this, I've had to use further TextMetrics properties* and write a HTML version of getTextInfo (imaginatively named getHtmlTextInfo) that makes the necessary calculations for elements to be positioned. It’s going to be less performant as the elements obviously need to be in the DOM to retrieve the info needed but I think it’s a fair trade-off for the improvements gained. The function is only called when using elements so canvas only clouds will be unaffected.** I’ve also included a little debug feature, along the lines of the canvas version, which visualises the various calculations against the rendered text.

There are a couple of comments regarding / mu where I wasn’t 100% sure on the calculations so those would need to be checked. I’m sure further improvements can be made, it’s not perfect. Hopefully someday browsers will iron out these inconsistencies or give us the tools to get around them properly ourselves…

* The TextMetrics properties used are actualBoundingBoxAscent and actualBoundingBoxDescent. These aren’t fully supported in all browsers, most notably IE and Firefox for Android. In these cases, I’ve fallen back to the calculations made by getHtmlTextInfo. These aren’t perfect – ascent/descent is calculated as the distance between the baseline and top/bottom of the element, not the text itself (as there is no way to do this) – but seem to provide reasonable results, although seemingly more so for some characters/languages than others. Again, it’s all a bit of trade-off but it ensures these users will at least see something, while improving the layout the most for the majority of users.

** When viewed in a browser that support the necessary TextMetrics properties mentioned above.

Mar 15 '21 13:03 bigwigal

Thanks for taking the time to do the investigation. This project is almost a decade old for me so I am going to assume you have figure it out for anything that I no longer recall (and that my younger me didn't document.)

No worries. Thanks for creating the library in the first place, it's been really useful.

#157 reads bad but I couldn't repro, so having a screenshot there will be greatly appreciated. I would also like to know if this is a long standing issue that happens on specific fonts, or this happens on newer browsers because of Canvas2D API change.

I've added screenshots to #157 so you/others can easily see the issue. I'm not able to confirm when this first appeared, for reasons noted in the update to the issue.

My concern here is that you may be overfitting for specific fonts, and these update will give us worse result on some other fonts. If you done enough testing to say this is not the case, I am happy to merge this to make the library better.

I completely understand your concerns and I think this issue does effect certain fonts/characters more than others, with fonts like Arial, Times etc. the most effected. I've checked all examples on the demo page and there doesn't seem to be any loss in accuracy in any of them, an improvement if anything.

textBaseline = "alphabetic"; being a particular problematic issue — with these changes I am not sure if CJK ideographic will ended up being mis-positioned (that's why I originally go with "middle".)

The CJK characters used in the examples on the demo page all seem fine with these changes, slight improvements if anything. I've seen some minor overlapping in the 紅樓夢 cloud, on that text actually (the largest), but there doesn't appear to be any in the fixed version. Things do move about a little but not enough to overlap.

I am also not sure why getHtmlTextInfo() is called in both getTextInfo() and drawText() — so we will always need the information in getHtmlTextInfo() to render in HTML? That function is non-trivial and will probably slow things down a lot. Do we really need to move the Y transform there?

Yes, getHtmlTextInfo() will always be needed to render to HTML (under this implementation), and canvas if the required TextMetrics properties described above aren't supported (IE11 etc. - the reason for the conditional call in getTextInfo()). I agree that it's far from ideal but I can't see a way to reliably position elements otherwise; it will always be best guess and results will vary. I could add the fix as a conditional (layout) optimisation that is triggered when some API flag is set or, taking that a little further, have a list/array of 'problem' fonts that the fix could apply to, that could then be extended via the API (something like optimiseFontAccuracy that defaults to false but could be set to true or something like ['arial']). That way users/developers could see if it makes sense to take on the overhead for themselves. I'm happy to take a look at this. I did consider it first time round due to the overhead but thought it may get a little messy. I'm aware that I'm biased toward Latin fonts/characters in my current use cases, although that may not always be the case (I work for a large uni and our material is very wide ranging). What do you think? Re the Y transform, this is needed to mirror the canvas implementation, i.e. aligned to text baseline and then offset to text centre from there. The calculation for this and top should really be done together and then applied as needed individually.

I'll have a proper look at your above suggestions in the meantime, all look ok at a glance.

Mar 18 '21 16:03 bigwigal

It's possible to merge it ?

Oct 19 '22 21:10 tclavier

Hi, please address my review comments and request for re-review.

Oct 26 '22 03:10 timdream

wordcloud2.js wordcloud2.js copied to clipboard

fix #157

wordcloud2.js
wordcloud2.js copied to clipboard