nerdamer icon indicating copy to clipboard operation
nerdamer copied to clipboard

Add nerdamer.convertFromLaTeX

Open Happypig375 opened this issue 5 years ago • 60 comments

Yeah, I was considering adding this as and additional add-on. I may decide to strip Guppy of everything but the parser and go that route. Thanks for the suggestion.

jiggzson avatar Jan 08 '19 18:01 jiggzson

I've recently started writing a PR for a vscode extension LaTeX-Worskhop for which I'd like to add the ability to go from LaTeX math statements → nerdamer evaluation → LaTeX math statement.

Surfice to say the ability to use LaTeX input would be rather handy; I was planning on writing it from scratch so if theres anything I can do to speed this development along, or improve it just let me know :grin:

tecosaur avatar Jan 13 '19 00:01 tecosaur

@tecosaur, thank you for letting me know. It's always nice to know how the library is being used. I started writing a LaTeX importer utilizing the existing parser and I feel like I've had some fairly decent results but I always welcome any suggestions or contributions. I guess it would be helpful to know which functions or use cases to focus on.

jiggzson avatar Jan 13 '19 02:01 jiggzson

Ideally at least all results from nerdamer.convertToLaTeX should be consumable for nerdamer.convertFromLaTeX.

Happypig375 avatar Jan 13 '19 03:01 Happypig375

@Happypig375, sounds good. Here's an interesting one. lim(x, x, Infinity)+1 outputs the exact same TeX as lim(x+1,x,Infinity). What am I missing?

jiggzson avatar Jan 13 '19 03:01 jiggzson

@jiggzson I just had a quick squiz at your commit from 3 days ago that added basic support, and correct me if I'm wrong but are you supporting inbuilt nerdamer functions (e.g. \\sin to sin() and then just writing if statements for latex to ascii substitutions here?

https://github.com/jiggzson/nerdamer/blob/b7bae8427ed1e770a72e5b3028369ab43eb5969c/nerdamer.core.js#L8478-L8489

(this is particularly with regards to the \\cdot line)

If so, I'm liable to rewrite this to take an object representing ascii equivilents to LaTeX codes and loop though all the substitutions instead.

Example Object

{
    '\\times': '*',
    '\\int': 'integral(#1)'
    etc.
}

On the note of the second line of my example object - what's the current provision for statements such as: \\int x dx? At the moment I see TypeError: dx[0] is undefined

tecosaur avatar Jan 13 '19 10:01 tecosaur

@tecosaur, you are correct. Keep in mind that the current implementation is just a very rough draft put together over my lunch break. I was curious to see if it would work and so far it seems promising.

If so, I'm liable to rewrite this to take an object representing ascii equivilents to LaTeX codes and loop though all the substitutions instead.

Go for it. This instance was just a test but looping through objects would definitely be better. This works fine for "keywords" and the majority of functions but not for functions such as integral, limit hence the if statements.

Here's my logic for the current approach. Declare \ as a prefix operator. This way it comes back as [operator, variable]. Another pass can be performed as you mentioned to make substitutions and reorganize arguments of certain functions. The back slash can just be bypassed and the variable substituted. Most functions just become name(arguments) etc. You can use console.log(_.pretty_print(raw_tokens)) at the beginning of LaTeX.parse to see what your input looks like.

This {x}^{2} \cdot \mathrm{max}\left(a,b\right) - 3 outputs this ((x), ^, (2), \, cdot, \, mathrm, (max), \, left, (a, ,, b, \, right), -, 3). The problem becomes functions such as int and limit since the arguments can vary. int is used for both integrals and definite integrals so I don't know if a blind substitution will work.

I'm not married to any particular way of doing things so if you find another solutions more efficient, then let's go that route. I just took a swing at it to see where it goes.

jiggzson avatar Jan 13 '19 14:01 jiggzson

I think my Maths teacher told me that if limits contained addition or subtraction they must be parenthesized. Therefore, lim(x+1,x,Infinity) should be \lim_{x\to\infty}\left(x+1\right).

Happypig375 avatar Jan 13 '19 14:01 Happypig375

@jiggzson the more I think about it I think the approach should be split up into two-ish major components.

1

A function that reads a string and extracts general LaTeX-y statements.

Would take a form

\\command[optional_argument_1][optional_argument_2][...]{main_argument_1}{main_argument_2}{...}

And probably

\\command[optional_argument_{lower_limit}^{upper_limmit}

2

A method to convert identified LaTeX functions into ascii / nerdamer equivalents.

tecosaur avatar Jan 13 '19 14:01 tecosaur

@Happypig375, I'll have to look and see if the libraries you mentioned at the beginning follow that rule. That would be awesome.

@tecosaur, that sort of what's happening right now isn't it? The only difference is that in part 1 the \ acts as a bypass. After that part 2 happens which is nothing more than a loop which filters out and replaces commands. The whole thing then gets "stitched" back together. Unless I'm misunderstanding you.

jiggzson avatar Jan 13 '19 15:01 jiggzson

@jiggzson I'll describe the example that made me think that point one isn't fully functional at the moment.

Regarding nerdamer.convertFromLatex( ... ).toTeX()

\\sqrt{4} is converted to 2

\\sqrt[3]{8} becomes something like sqrt3 \\cdot 8 when it should be 2

tecosaur avatar Jan 13 '19 17:01 tecosaur

In a similar vein, here's another example showing why I think there's a need found a more robust implementation of this.

nerdamer('sum(2x,X,1,5').toTeX() = "30"
nerdamer.convertFromLaTeX('\\sum_{x=1}^5 2x').toTeX() = "x=1"

If the implementation would get \\sum as a LaTeX statement and extract the limits and text afterwards and pass it to the converter I imagine we could (once that's in place) fairly easily turn than into sum(2x,x,1,5).

I have an idea for the implementation of the second part which I'll get started on tonight. I'll let you know when I have a working prototype (should be soon).

tecosaur avatar Jan 13 '19 18:01 tecosaur

\sqrt{4} is converted to 2

That behavior is related to nerdamer. The square root of a perfect square is always evaluated.

\sqrt[3]{8} becomes something like sqrt3 \cdot 8 when it should be 2

I see what you mean.

I have an idea for the implementation of the second part which I'll get started on tonight. I'll let you know when I have a working prototype (should be soon).

Sounds exciting. Keep me posted.

Thanks!

jiggzson avatar Jan 13 '19 18:01 jiggzson

\sqrt{4} is converted to 2

That behavior is related to nerdamer. The square root of a perfect square is always evaluated.

To me that's desirable behaviour :grin: I'd much rather 2 than root 4.

I'm also creating a series of tests/assertions to be used as a benchmark/scoring/progress system of sorts. So my attention is divided, but you'll get two things for the price of one :stuck_out_tongue:

tecosaur avatar Jan 13 '19 18:01 tecosaur

Update - Testing

I've written up a few tests using the mocha framework (bdd) and here's the current 'report card'.

image

I think we have room for improvement.

I'll add a few more and create a PR with this tommorow if you like.

tecosaur avatar Jan 13 '19 22:01 tecosaur

@tecosaur, sounds good. Will your PR include functions and a fix for the nthroot issue as well? Also, a number of those items are formatting commands? Those don't really map to anything in nerdamer correct?

jiggzson avatar Jan 14 '19 00:01 jiggzson

@jiggzson Thought I'd put what I have so far up.

The reason why I've put 'formatting commands' such as \bigg up is because right now they mess up the output, which IMO - they shouldn't. See the example below.

2 \\bigg(1+1\\bigg)

should give "4" but instead gives

2 \\bigg(1+1\\bigg)

ATM that's just the unit tests (well, more of a draft for some unit tests), but code for even more tests, and making more of those test pass (or at least working toward that) should come along too in due course :)

tecosaur avatar Jan 14 '19 00:01 tecosaur

@jiggzson I'm somewhat lost with regards to something and I'm hoping you can tell me what I'm missing.

I'm trying to resolve the first set of test cases I have defined (functions that are currently \\mathrm{func} that should be \\func, so replaced the current return at the end of parser.toTeX(...) with the following:

            const InbuiltLaTeXFunctions = ['arccos', 'cos', 'csc', 'exp', 'ker', 'limsup', 'min', 'sinh', 'arcsin', 'cosh', 'deg', 'gcd', 'lg', 'ln', 'Pr', 'sup', 'arctan', 'cot', 'det', 'hom', 'lim', 'log', 'sec', 'tan', 'arg', 'coth', 'dim', 'inf', 'liminf', 'max', 'sin', 'tanh']
            const replaceInbuiltFunctions = (s) => s.replace(new RegExp('\\\\'+`mathrm{(${InbuiltLaTeXFunctions.join('|')})}`, 'gm'), '\\$1')

            return replaceInbuiltFunctions(TeX.join(' '));

However, this doesn't seem to have changed anything...

Update I've descovered it's not parser.toTeX I wanted but LaTeX.value.

tecosaur avatar Jan 14 '19 22:01 tecosaur

Other changes I'm making are sucessful. I'll create seperate pull requests so that different sets of changes can be reviewed seperately,

tecosaur avatar Jan 14 '19 22:01 tecosaur

Previous State of Affairs

Current State of Affairs

This is with the three PRs above being accepted image

tecosaur avatar Jan 15 '19 00:01 tecosaur

@jiggzson A question.

With LaTeX new operators and commands can be defined via methods such as

\DeclareMathOperator{\sech}{sech}

and

\newcommand{\dd}{\ensuremath{\mathrm{d}}}

In my the vscode extension I plan on trying to identify such lines and add them to some sort of config. I would imagine some users would also find this quite useful as they would be able to add their own commonly used substitutions.

I'm trying to locate the relevent section for this sort of config and thought it could be a good idea to ask you.

tecosaur avatar Jan 15 '19 20:01 tecosaur

Also another issue I've encountered (can be seen in this image)

image

I don;t think this is an issue with 'times' as 2 times 2 producing 4 doesn't seem too outlandish, however other than LaTeX not doing this, there are far more commands such as big and left and pm where I doubt it would be desired behaviour.

I imagine the best thing to do would to add a 'flag' of sorts and have it so that it only performs the replacement if it is set.

https://github.com/jiggzson/nerdamer/blob/b7bae8427ed1e770a72e5b3028369ab43eb5969c/nerdamer.core.js#L8467-L8505

tecosaur avatar Jan 15 '19 21:01 tecosaur

@tecosaur, do you think it's related to this (7bd13d2) commit? With that commit I added support for word operators but I forgot to remove my test operator times.

jiggzson avatar Jan 15 '19 21:01 jiggzson

Hmmm. I'm not sure what would be easiest to implement.

  1. Determine if slash was beforehand
  2. Add flags
  3. Concanate slash with command

@jiggzson Since you've written this, I thought you might have some ideas regarding this.

Notes

I don't think it's related to a times specific commit, here's why:

image

image screenshot from 2019-01-15 22-03-41

tecosaur avatar Jan 15 '19 21:01 tecosaur

@tecosaur, I'm in the same boat as you. I'm taking ideas as well. My only suggestion was to cut time by reusing the existing tokenizer. I figured we can do this since quite a bit of the LaTeX is just formatting and can be discarded. As I mentioned before the idea is then to declare \ as an operator and then feed the string to the Parser.tokenize, apply a filter pass to re-arrange the tokens, and then glue it back. Let Parser.parse worry about precedence etc.

Example

nerdamer.convertToLaTeX('integrate(x,x)');
// '\int {x}\, dx'

Parser.tokenize will then generate (\, int, (y), dx). The brackets denote an array. The slash can be discarded, int can be substituted for integrate, after int comes the function neatly in an array which can be fed back to LaTeX.parse to make sure that's formatted correctly, and dx can be stripped of the d.

If we look at \\frac{1}{2}", this produces (\, frac, (1), (2)). When encountering frac we know that the following 2 array are just divided by each other.

I don't know if it's more efficient to just write everything from scratch or to go the proposed route. It just seems like starting from scratch seems like a lot of work for a method that really just needs to be able a few cases.

jiggzson avatar Jan 15 '19 22:01 jiggzson

I have an idea. I'll get back to you in a few minutes.

tecosaur avatar Jan 15 '19 22:01 tecosaur

Here's my idea.

            // add slash info
            for (let tokenIndex = 0; tokenIndex < raw_tokens.length - 1; tokenIndex++) {
                const token = raw_tokens[tokenIndex];
                if (raw_tokens[tokenIndex+1].type === 'VARIABLE_OR_LITERAL') {
                    if (token.type === 'OPERATOR' && token.value === '\\') {
                        raw_tokens[tokenIndex+1].command = true;
                    } else {
                        raw_tokens[tokenIndex+1].command = false;
                    }
                }
            }

The idea works, but it introduce issues due to it being in the before the filter function. I'll investigate putting it into the filter.

tecosaur avatar Jan 15 '19 22:01 tecosaur

Will you then be handling the commands on another pass? Another risk you run is that sometimes the \ can denote a space or operator but you can test for that as well.

The way the filter pass currently works is by rebuilding the tokens array. If it sees big or \ or left etc. it just doesn't add it and continues. If it encounters a function that needs reformatting, it looks ahead and and adds it in the correct order on the new stack. If you find your idea easier then play around with it and see what results you get. If it works go for it.

jiggzson avatar Jan 15 '19 22:01 jiggzson

I had an issue with nested commands, however that issue is now fixed, and I think I have a robust solution.

[in filterTokens]

if (typeof token.command === 'undefined' &&
                        i>0 && ['VARIABLE_OR_LITERAL', 'FUNCTION'].includes(token.type)) {
                            if (tokens[i-1].type === 'OPERATOR' && tokens[i-1].value === '\\') {
                                token.command = true;
                            } else {
                                token.command = false;
                            }
                        }

Let me know if you can see any issues.

tecosaur avatar Jan 15 '19 23:01 tecosaur

A few things.

Once again did something funny with the last PR, so you got a bonus change.

~~The Travis CI error with the second commit seems to be something funny with Travis? I did the full test suite on my machine and none of the tests (except for the in-progress latex import functionality tests I added) failed.~~ Edit: Found some git-added text, removed it, everything fine now

Current State of Affairs

image image

If you were beginning to think it looked like we were almost there:

image

If you (or anyone else who's interested - @Happypig375 ?) have any ideas for good test cases, please send me a PR at https://github.com/tecosaur/nerdamer/tree/dev and I'll be sure to include them :)

tecosaur avatar Jan 15 '19 23:01 tecosaur