nalgebra
nalgebra copied to clipboard
Matrix Display output alignment is not Unicode-aware
Hi,
Thanks for all the wonderful work on this library. I'm coming back after a Rust hiatus and I can't believe the const generics work as well as they do.
I'm using a custom scalar type that uses a combining overline character instead of a minus sign, to match the style of this reference work:
Somewhat ironically, a typesetting convention designed to make alignment easier in print breaks alignment for Matrix
output using this custom type:
┌ ┐
│ 0 ̅1 0 0 │
│ 0 ̅1 0 0 │
│ ⅔ 0 1 0 │
│ 0 0 0 1 │
└ ┘
The Display
implementation here uses .char().count()
instead of grapheme counting, which is what causes the misalignment seen here. I imagine this would also affect less frivolous custom Display
impls, like Chinese characters.
unicode-segmentation
is a large dependency to make a tiny edge case look nicer, so I'm not exactly saying this should be implemented differently, but I'd be interested if anyone else has had this issue or if there's a solution that doesn't require increasing the code size too massively.
I imagine this would also affect less frivolous custom Display impls, like Chinese characters
Is there a reason to put Chinese characters in an nalgebra::Matrix
?
Some people may prefer to output numbers in Chinese numerals instead of the Arabic numeral system. A cursory glance at crates.io doesn't reveal any crates that have nalgebra
and a crate for that formatting as a dependency, so I think that's not common enough to have made it on crates.io
. I'd be interested to hear if anyone else has tried to use Unicode characters that break character-level counting before besides myself!
A potential solution would be to allow trait overloading for computing the length of a scalar in characters, at the level of the scalar. This would not help you if you were printing a f32
using fullwidth numerals to correctly align arabic numerals with hanzi, but would permit a custom scalar type to align itself using unicode-segmentation
.
@Ralith I don't believe that Hanzi numerals are popular in mathematical work, but fullwidth numerals can be useful for alignment purposes; e.g. in terminals well-formatted CJK characters tend to take up two horizontal cells rather than one, so using fullwidth numerals can be useful.
However I think those will actually work fine, as it's a char-count and not a byte-count, so the fullwidth forms should be aligned correctly relative to one another, as long as they aren't being constructed using CJK combining characters (which are sadly not that well-supported anywhere, in my experience.) unicode-segmentation
wouldn't actually help with aligning CJK with non-CJK characters either since CJK taking up 2 cells is a property of terminals, not unicode...
Another use case, beyond mine, is computer algebra systems or other libraries that want to implement symbolic expressions. Perhaps that's more common than different languages.
You make an excellent point—allowing the custom types to control their display width eliminates my main concern with a fix for this, the large size of unicode-segmentation
. It also allows correct behavior even when the alignment doesn't follow graphemes. I would be happy to implement this solution if people are interested.
The val_width
function could be moved into a trait that inherits from Display
with a default implementation as it's written. The API would have an additional trait, which is unfortunate if it's not widely used, but the source code would change very little. (I don't think it makes sense to have this apply to anything besides Display
, and having a ton of separate traits for the other formatting traits makes little sense, so it would require some refactoring of the macro to cover the extra case.)
An important note is that fixing the issue on my end is more challenging than I originally thought. One solution I had considered was to add zero-width characters to the Display
impl, so both positive and negative numbers had the additional spaces, but that only lets you pad more, not less. The result is pretty bad alignment, and it's quite brittle when you consider extensions like fractions or colored output.