mathjs icon indicating copy to clipboard operation
mathjs copied to clipboard

Cannot always parse a unit deserialized from JSON via the expression parser

Open josdejong opened this issue 2 months ago • 10 comments

From discussion https://github.com/josdejong/mathjs/discussions/3031#discussioncomment-14692393

When formatting a unit, it is not always possible to parse it via the expression parser when it was revived from JSON data.

Example:

const unit1 = math.evaluate('t=9cm * 7in^2')
console.log('unit1', unit1.toString())
// "24.803149606299215 in^3"

const data = JSON.stringify(unit1, math.replacer)
console.log('data', data)
// {"mathjs":"Unit","value":63,"unit":"cm in^2","fixPrefix":false}

const unit2 = JSON.parse(data, math.reviver)
console.log('unit2', unit2.toString())
// "63 cm in^2"

console.log(math.parse(unit2.toString()));
// SyntaxError: Value expected (char 9)

The outcome 63 cm in^2 is technically correct, but the parser cannot parse this due to a conflict with in interpreted as the unit conversion operator a in b instead of the unit inch.

It looks like serializing and reviving either loses some of the original information, since after deserialization the .toString() method gives a different outcome.

Maybe we should normalize the unit before serialization, or add additional state ensuring that the unit .toString() gives the same output as the original.

josdejong avatar Oct 22 '25 09:10 josdejong

It turns out that the serialized unit loses the property this.skipAutomaticSimplification.

Still, we can address this issue in two ways:

  1. serialize the property skipAutomaticSimplification
  2. normalize the unit before serialization

Or we can do both 😄

josdejong avatar Oct 22 '25 09:10 josdejong

Then, separately we have the issue of a stringified unit not always being parsable. In the case above, the unit can be simplified such that we do not encounter the problem. But the following example is not parsable with the expression parser anyway:

const unit = math.evaluate('2 kg * 3 in^2')
console.log(unit.toString()) // "6 kg in^2"

console.log(math.parse(unit.toString()))
// Uncaught SyntaxError: Value expected (char 8)

What we maybe can do is: when the unit contains in (conflicting with the conversion operator, do not use implicit multiplication but stringify the unit like "6 kg*in^2"

josdejong avatar Oct 22 '25 09:10 josdejong

It turns out that the serialized unit loses the property this.skipAutomaticSimplification.

Still, we can address this issue in two ways:

1. serialize the property `skipAutomaticSimplification`
2. normalize the unit before serialization

Yes, I thought something like that was going on. I think therefore at least (1) must be done, and I think that makes (2) moot, doesn't it?

gwhitney avatar Oct 22 '25 18:10 gwhitney

But the following example is not parsable with the expression parser anyway:

const unit = math.parse('6 kg in^2')

Yes this is clearly a bug. Shouldn't it just be that in followed by any operator is never the in operator but instead the in unit? I.e., fix the parser bug rather than work around it by changing the stringification, since one might conceivably directly write something like 6 lb in^2?

gwhitney avatar Oct 22 '25 18:10 gwhitney

Shouldn't it just be that in followed by any operator is never the in operator but instead the in unit?

Yes indeed, I think that would work, that is a good idea. I think there are no other special edge cases with in and units since units only have a specific notation (a list with units, optionally with an exponent, which are implicitely multiplied).

josdejong avatar Oct 29 '25 08:10 josdejong

The serialization issue is addressed via #3572.

I keep this issue open to also look into solving the parsing issue with math.parse('6 kg in^2').

josdejong avatar Oct 31 '25 08:10 josdejong

Just going back to my hacky solution, could we not make in the operator configurable? I noticed there's some legacy options that can be explicitly turned on? My understanding is in is kept for legacy reasons right?

I understand not wanting to complicate the parser though...

dpnova avatar Nov 01 '25 05:11 dpnova

Apart from solving the in issue in the implementation, I think having in as both a unit and an operator is not ideal, as it makes those expressions not being well-defined. Because it is not possible to tell which has been the intent in some cases, e.g.:

1 in in in

is that 1 inch^3 or 1 inch converted to inch?

there is a similar clash with min being both a unit and a function, but that seems fine because it is not an operator, and should be followed by brackets if the function case has been the intention

mlameiCT avatar Nov 02 '25 20:11 mlameiCT

1 in in in

is that 1 inch^3 or 1 inch converted to inch?

Yeah that is a nice demonstration of the ambiguity that the operator in causes right now 😄. In case of doubt: use parentheses.

Just going back to my hacky solution, could we not make in the operator configurable? I noticed there's some legacy options that can be explicitly turned on? My understanding is in is kept for legacy reasons right?

Good idea. We could deprecate the operator in over time (start with a warning explaining to use to instead of in), or put it behind a feature flag and turn it off by default in the future. Of course you can already choose yourself to not use operator in of course to prevent ambiguous situations.

josdejong avatar Nov 05 '25 11:11 josdejong

The orginal serialization issue is fixed now in [email protected] via #3572.

I'll keep this issue open to think through fixes for the parser on handling in.

josdejong avatar Nov 05 '25 13:11 josdejong