compute-engine icon indicating copy to clipboard operation
compute-engine copied to clipboard

Support to parsing incomplete Latex expressions

Open mindreframer opened this issue 4 years ago • 6 comments

Parsing incomplete (mathematically) expressions seems to leave some parts outs, eg:

parse("2 * 2 = ")

=> [ 'Multiply', { num: '2' }, { num: '2' } ]

parse("2 * 2 = 4")
=> [ 'Equal', [ 'Multiply', { num: '2' }, { num: '2' } ], { num: '4' } ]

This has the implication that chained parse / serialize calls would alter the initial input...

Will this ever work with in-complete expressions?

mindreframer avatar Apr 10 '21 11:04 mindreframer

test case:

import { expect } from "@jest/globals";
import { parse, serialize } from "@cortex-js/math-json";

function roundtrip(input: string) {
  return serialize(parse(input));
}
describe("parse", () => {
  it("changes input on roundtrip", () => {
    expect(roundtrip("2\\times2=4")).toBe("2\\times2=4");
    expect(roundtrip("2\\times2=")).toBe("2\\times2"); // misses trailing `=` sign
  });
});

mindreframer avatar Apr 10 '21 11:04 mindreframer

In general, there is no round-tripping guarantee. The Latex expression could include extra whitespace, comments or other syntactic idioms that are not preserved in the MathJSON expression. For example, both \frac12 and \frac{1}{2} result in the same MathJSON expression.

That said, in this particular example, I can see a case for attempting to preserve the "partial" expression. FYI, it currently returns a missing-operand error. Instead, it could return [ 'Equal', [ 'Multiply', 2, 2 ], 'Missing']. This would then serialize to 2 \times 2 = \placeholder.

arnog avatar Apr 10 '21 12:04 arnog

I see. Maybe this could be a configurable element. A placeholder has already different meaning and might be used for other purposes. Or - the missing operand could serialize to an space char. That would retain the input expression semantically. I think this would work out for my situation and would provide sound behavior. Wdyt?

mindreframer avatar Apr 10 '21 13:04 mindreframer

You can configure how the Missing symbol gets serialized:

const customLatex = new LatexSyntax( { dictionary: [
    ...LatexSyntax.getDictionary(), 
    { name: 'Missing', serialize: '\\placeholder'}
] } );
customLatex.serialize(['Equal', 'x', 'Missing']);

Replace '\\placeholder' with the desired output, including an empty string.

Does that work for you?

arnog avatar Apr 10 '21 16:04 arnog

Hey @arnog , it surely would! Only I'm not getting the "Missing" symbol, when I parse the expression

let res = parse("2\\times2=")
// ["Multiply",{"num":"2"},{"num":"2"}]

Do I have to configure something for this?

mindreframer avatar Apr 10 '21 21:04 mindreframer

No, that's the current behavior, but I'm going to change it to use the Missing symbol in this case. Stay tuned.

arnog avatar Apr 10 '21 22:04 arnog