mathlive icon indicating copy to clipboard operation
mathlive copied to clipboard

Mathml: Using {,} for decimal separator translates to .

Open Leon-Lj opened this issue 1 year ago • 6 comments

If you have an expression using {,} for decimal separator, when converting to mathml it'll come out as . For example 0{,}2 translates to <mn>0.2</mn>.

This is caused by the function asDigit() in atom-class.ts line 1131:

  asDigit(): string {
    if (this.type === 'mord' && this.value && /^[\d,\.]$/.test(this.value))
      return this.value;

    if (this.type === 'group' && this.body?.length === 2) {
      if (this.body![0].type === 'first' && this.body![1].value === ',')
        return '.';
    }
    return '';
  }

Changing the middle return to return ','; fixes the problem.

Leon-Lj avatar Mar 06 '24 14:03 Leon-Lj

This is a deliberate choice. Consistently using "." to represent a decimal separator avoids ambiguities such as <mn>1,234</mn>. The agent rendering the MathML can then decide to represent it appropriately, possibly taken into account user preferences.

arnog avatar Mar 06 '24 15:03 arnog

The problem with that is that it still leaves room for ambiguities. There are regions that uses the "." as a thousand separator, creating the same possible confusion.

Wikipedia has this to say on thousand separators:

Three ways to group the number ten thousand with digit group separators.

  1. Space, the internationally recommended thousands separator.
  2. Period (or full stop), the thousands separator used in many non-English speaking countries.
  3. Comma, the thousands separator used in most English-speaking countries.

It doesn't look to me like MathML specifies how a decimal number should be represented, the specification does not say and MDN seems to say that any number representation should be fine.

So I don't think that the MathML renderer can adjust what's rendered, it simply renders what is there. I would imagine that screen readers would use the current locale to determine if it should read it as "one point two hundred thirty four" or "One thousand two hundred thirty four" (maybe @NSoiffer could chime in here). But visually it seems to be up to the user to determine.

Leon-Lj avatar Mar 06 '24 17:03 Leon-Lj

Depending on which flavor of MathML you follow, you can have different interpretations.

MathML Core does suggest that decimal numbers should be represented with a decimal point: "Generally speaking, a numeric literal is a sequence of digits, perhaps including a decimal point, representing an unsigned integer or real number." (emphasis added).

"Full" MathML is even more vague, except to say that negative numbers are not allowed inside a <mn> tag. So <mn>onze</mn> is a perfectly valid MathML number, but <mn>-11</mn> is not.

However, you can be confident that the MathML produced by MathLive will generate numbers in a consistent manner, including their decimal representation. If you want a different representation, you can of course manipulate the output from MathLive to suit your needs.

arnog avatar Mar 06 '24 18:03 arnog

MathML Core does suggest that decimal numbers should be represented with a decimal point: "Generally speaking, a numeric literal is a sequence of digits, perhaps including a decimal point, representing an unsigned integer or real number." (emphasis added).

In the full mathml spec there's a setting called decimalpoint which is used to determine which decimal separator to use. While decimalpoint is not part of the MathML Core I would argue that they have just used the same language from the full spec, and what they mean is "decimal separator".

But I agree that the intent is not clear, if they mean "decimal separator" it should say that. If they do mean point, and only point, then that should be made clear so there's no room for misinterpretation.

Leon-Lj avatar Mar 06 '24 19:03 Leon-Lj

MathML is agnostic about the block and decimal separators. The examples in the spec for include hex numbers and roman numerals: basically, if it is meant to be interpreted as a number, it belongs in an . In the forthcoming MathML 4, we are introducing an attribute that allows authors to express their "intent" on how to speak something. This is meant for disambiguation for assistive techologies (AT). For example, |x| could be determinant, absolute value, cardinality, magnitude, .... We discussed how numbers should be represented and the unanimous opinion (including those from countries that use "," for decimal separator) was to stick with "." as a decimal separator as Arno did. The feeling was that it was hard to use systems that accepted either and mistakes were very common.

As for screen readers, they typically pass the contents off to the speech engine, so it is up to the speech engine who to speak the number, and that is typically locale dependent.

I should note that most MathML generators do a poor job of handling numbers with "."s and ","s in them. They typically break what should be a single into multiple s separated by s for the "." and ",". So AT will do a particularly poor job reading them if the AT doesn't fix them back up. MathCAT has options for listing the block separators and decimal separators so it can attempt a repair using heuristics, but I haven't figured out how to reliably set those. The document language (author) or the voice being used (listener) will have a language tag, but not necessarily a "region" (country) code also. For example, Mexico uses "." for a decimal separator but most South American countries use ",". So if all MathCAT knows is that the current document is "es", then it's basically a coin flip. But if the document is es-mx, then MathCAT can set the options correctly automatically. I don't think other AT's attempt a repair, at least not a repair based on locale. Maybe Volker Sorge's "Speech Rule Engine" tries.

My bottom line advice is to try and set the document (or at the math) language code as precisely as possible.

I hope that helps a little,

 Neil

On Wed, Mar 6, 2024 at 11:36 AM Leon Ljunggren @.***> wrote:

MathML Core does suggest https://www.w3.org/TR/mathml-core/#number-mn that decimal numbers should be represented with a decimal point: "Generally speaking, a numeric literal is a sequence of digits, perhaps including a decimal point, representing an unsigned integer or real number." (emphasis added).

In the full mathml spec there's a setting called decimalpoint which is used to determine which decimal separator to use. While decimalpoint is not part of the MathML Core I would argue that they have just used the same language from the full spec, and what they mean is "decimal separator".

But I agree that the intent is not clear, if they mean "decimal separator" it should say that. If they do mean point, and only point, then that should be made clear so there's no room for misinterpretation.

— Reply to this email directly, view it on GitHub https://github.com/arnog/mathlive/issues/2315#issuecomment-1981646755, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALZM3CRCRZZLMRLO7TEHEDYW5V3NAVCNFSM6AAAAABEJEZFHGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBRGY2DMNZVGU . You are receiving this because you were mentioned.Message ID: @.***>

NSoiffer avatar Mar 06 '24 20:03 NSoiffer

We discussed how numbers should be represented and the unanimous opinion (including those from countries that use "," for decimal separator) was to stick with "." as a decimal separator as Arno did. The feeling was that it was hard to use systems that accepted either and mistakes were very common.

Alright, then this issue can be closed as nothing to fix, working as intended. Thanks for the clarification. :)

Leon-Lj avatar Mar 07 '24 09:03 Leon-Lj