nerdamer
nerdamer copied to clipboard
Add nerdamer.convertFromLaTeX
Yeah, I was considering adding this as and additional add-on. I may decide to strip Guppy of everything but the parser and go that route. Thanks for the suggestion.
I've recently started writing a PR for a vscode extension LaTeX-Worskhop for which I'd like to add the ability to go from LaTeX math statements → nerdamer evaluation → LaTeX math statement.
Surfice to say the ability to use LaTeX input would be rather handy; I was planning on writing it from scratch so if theres anything I can do to speed this development along, or improve it just let me know :grin:
@tecosaur, thank you for letting me know. It's always nice to know how the library is being used. I started writing a LaTeX importer utilizing the existing parser and I feel like I've had some fairly decent results but I always welcome any suggestions or contributions. I guess it would be helpful to know which functions or use cases to focus on.
Ideally at least all results from nerdamer.convertToLaTeX should be consumable for nerdamer.convertFromLaTeX.
@Happypig375, sounds good. Here's an interesting one. lim(x, x, Infinity)+1
outputs the exact same TeX as lim(x+1,x,Infinity)
. What am I missing?
@jiggzson I just had a quick squiz at your commit from 3 days ago that added basic support, and correct me if I'm wrong but are you supporting inbuilt nerdamer functions (e.g. \\sin
to sin(
) and then just writing if
statements for latex to ascii substitutions here?
https://github.com/jiggzson/nerdamer/blob/b7bae8427ed1e770a72e5b3028369ab43eb5969c/nerdamer.core.js#L8478-L8489
(this is particularly with regards to the \\cdot
line)
If so, I'm liable to rewrite this to take an object representing ascii equivilents to LaTeX codes and loop though all the substitutions instead.
Example Object
{
'\\times': '*',
'\\int': 'integral(#1)'
etc.
}
On the note of the second line of my example object - what's the current provision for statements such as: \\int x dx
?
At the moment I see TypeError: dx[0] is undefined
@tecosaur, you are correct. Keep in mind that the current implementation is just a very rough draft put together over my lunch break. I was curious to see if it would work and so far it seems promising.
If so, I'm liable to rewrite this to take an object representing ascii equivilents to LaTeX codes and loop though all the substitutions instead.
Go for it. This instance was just a test but looping through objects would definitely be better. This works fine for "keywords" and the majority of functions but not for functions such as integral
, limit
hence the if statements.
Here's my logic for the current approach. Declare \
as a prefix operator. This way it comes back as [operator, variable]
. Another pass can be performed as you mentioned to make substitutions and reorganize arguments of certain functions. The back slash can just be bypassed and the variable substituted. Most functions just become name(arguments)
etc. You can use console.log(_.pretty_print(raw_tokens))
at the beginning of LaTeX.parse
to see what your input looks like.
This {x}^{2} \cdot \mathrm{max}\left(a,b\right) - 3
outputs this ((x), ^, (2), \, cdot, \, mathrm, (max), \, left, (a, ,, b, \, right), -, 3)
. The problem becomes functions such as int
and limit
since the arguments can vary. int
is used for both integrals and definite integrals so I don't know if a blind substitution will work.
I'm not married to any particular way of doing things so if you find another solutions more efficient, then let's go that route. I just took a swing at it to see where it goes.
I think my Maths teacher told me that if limits contained addition or subtraction they must be parenthesized. Therefore, lim(x+1,x,Infinity)
should be \lim_{x\to\infty}\left(x+1\right)
.
@jiggzson the more I think about it I think the approach should be split up into two-ish major components.
1
A function that reads a string and extracts general LaTeX-y statements.
Would take a form
\\command[optional_argument_1][optional_argument_2][...]{main_argument_1}{main_argument_2}{...}
And probably
\\command[optional_argument_{lower_limit}^{upper_limmit}
2
A method to convert identified LaTeX functions into ascii / nerdamer equivalents.
@Happypig375, I'll have to look and see if the libraries you mentioned at the beginning follow that rule. That would be awesome.
@tecosaur, that sort of what's happening right now isn't it? The only difference is that in part 1 the \
acts as a bypass. After that part 2 happens which is nothing more than a loop which filters out and replaces commands. The whole thing then gets "stitched" back together. Unless I'm misunderstanding you.
@jiggzson I'll describe the example that made me think that point one isn't fully functional at the moment.
Regarding nerdamer.convertFromLatex( ... ).toTeX()
\\sqrt{4}
is converted to 2
\\sqrt[3]{8}
becomes something like sqrt3 \\cdot 8
when it should be 2
In a similar vein, here's another example showing why I think there's a need found a more robust implementation of this.
nerdamer('sum(2x,X,1,5').toTeX() = "30"
nerdamer.convertFromLaTeX('\\sum_{x=1}^5 2x').toTeX() = "x=1"
If the implementation would get \\sum
as a LaTeX statement and extract the limits and text afterwards and pass it to the converter I imagine we could (once that's in place) fairly easily turn than into sum(2x,x,1,5)
.
I have an idea for the implementation of the second part which I'll get started on tonight. I'll let you know when I have a working prototype (should be soon).
\sqrt{4} is converted to 2
That behavior is related to nerdamer. The square root of a perfect square is always evaluated.
\sqrt[3]{8} becomes something like sqrt3 \cdot 8 when it should be 2
I see what you mean.
I have an idea for the implementation of the second part which I'll get started on tonight. I'll let you know when I have a working prototype (should be soon).
Sounds exciting. Keep me posted.
Thanks!
\sqrt{4}
is converted to 2That behavior is related to nerdamer. The square root of a perfect square is always evaluated.
To me that's desirable behaviour :grin: I'd much rather 2
than root 4
.
I'm also creating a series of tests/assertions to be used as a benchmark/scoring/progress system of sorts. So my attention is divided, but you'll get two things for the price of one :stuck_out_tongue:
Update - Testing
I've written up a few tests using the mocha framework (bdd) and here's the current 'report card'.
I think we have room for improvement.
I'll add a few more and create a PR with this tommorow if you like.
@tecosaur, sounds good. Will your PR include functions and a fix for the nthroot issue as well? Also, a number of those items are formatting commands? Those don't really map to anything in nerdamer correct?
@jiggzson Thought I'd put what I have so far up.
The reason why I've put 'formatting commands' such as \bigg
up is because right now they mess up the output, which IMO - they shouldn't. See the example below.
2 \\bigg(1+1\\bigg)
should give "4"
but instead gives
2 \\bigg(1+1\\bigg)
ATM that's just the unit tests (well, more of a draft for some unit tests), but code for even more tests, and making more of those test pass (or at least working toward that) should come along too in due course :)
@jiggzson I'm somewhat lost with regards to something and I'm hoping you can tell me what I'm missing.
I'm trying to resolve the first set of test cases I have defined (functions that are currently
\\mathrm{func}
that should be\\func
, so replaced the current return at the end ofparser.toTeX(...)
with the following:const InbuiltLaTeXFunctions = ['arccos', 'cos', 'csc', 'exp', 'ker', 'limsup', 'min', 'sinh', 'arcsin', 'cosh', 'deg', 'gcd', 'lg', 'ln', 'Pr', 'sup', 'arctan', 'cot', 'det', 'hom', 'lim', 'log', 'sec', 'tan', 'arg', 'coth', 'dim', 'inf', 'liminf', 'max', 'sin', 'tanh'] const replaceInbuiltFunctions = (s) => s.replace(new RegExp('\\\\'+`mathrm{(${InbuiltLaTeXFunctions.join('|')})}`, 'gm'), '\\$1') return replaceInbuiltFunctions(TeX.join(' '));
However, this doesn't seem to have changed anything...
Update I've descovered it's not parser.toTeX
I wanted but LaTeX.value
.
Other changes I'm making are sucessful. I'll create seperate pull requests so that different sets of changes can be reviewed seperately,
Previous State of Affairs
Current State of Affairs
This is with the three PRs above being accepted
@jiggzson A question.
With LaTeX new operators and commands can be defined via methods such as
\DeclareMathOperator{\sech}{sech}
and
\newcommand{\dd}{\ensuremath{\mathrm{d}}}
In my the vscode extension I plan on trying to identify such lines and add them to some sort of config. I would imagine some users would also find this quite useful as they would be able to add their own commonly used substitutions.
I'm trying to locate the relevent section for this sort of config and thought it could be a good idea to ask you.
Also another issue I've encountered (can be seen in this image)
I don;t think this is an issue with 'times' as 2 times 2
producing 4
doesn't seem too outlandish, however other than LaTeX not doing this, there are far more commands such as big
and left
and pm
where I doubt it would be desired behaviour.
I imagine the best thing to do would to add a 'flag' of sorts and have it so that it only performs the replacement if it is set.
https://github.com/jiggzson/nerdamer/blob/b7bae8427ed1e770a72e5b3028369ab43eb5969c/nerdamer.core.js#L8467-L8505
@tecosaur, do you think it's related to this (7bd13d2) commit? With that commit I added support for word operators but I forgot to remove my test operator times
.
Hmmm. I'm not sure what would be easiest to implement.
- Determine if slash was beforehand
- Add flags
- Concanate slash with command
@jiggzson Since you've written this, I thought you might have some ideas regarding this.
Notes
I don't think it's related to a times
specific commit, here's why:
@tecosaur, I'm in the same boat as you. I'm taking ideas as well. My only suggestion was to cut time by reusing the existing tokenizer. I figured we can do this since quite a bit of the LaTeX is just formatting and can be discarded. As I mentioned before the idea is then to declare \
as an operator and then feed the string to the Parser.tokenize
, apply a filter pass to re-arrange the tokens, and then glue it back. Let Parser.parse
worry about precedence etc.
Example
nerdamer.convertToLaTeX('integrate(x,x)');
// '\int {x}\, dx'
Parser.tokenize
will then generate (\, int, (y), dx)
. The brackets denote an array. The slash can be discarded, int
can be substituted for integrate
, after int
comes the function neatly in an array which can be fed back to LaTeX.parse
to make sure that's formatted correctly, and dx
can be stripped of the d
.
If we look at \\frac{1}{2}"
, this produces (\, frac, (1), (2))
. When encountering frac
we know that the following 2 array are just divided by each other.
I don't know if it's more efficient to just write everything from scratch or to go the proposed route. It just seems like starting from scratch seems like a lot of work for a method that really just needs to be able a few cases.
I have an idea. I'll get back to you in a few minutes.
Here's my idea.
// add slash info
for (let tokenIndex = 0; tokenIndex < raw_tokens.length - 1; tokenIndex++) {
const token = raw_tokens[tokenIndex];
if (raw_tokens[tokenIndex+1].type === 'VARIABLE_OR_LITERAL') {
if (token.type === 'OPERATOR' && token.value === '\\') {
raw_tokens[tokenIndex+1].command = true;
} else {
raw_tokens[tokenIndex+1].command = false;
}
}
}
The idea works, but it introduce issues due to it being in the before the filter function. I'll investigate putting it into the filter.
Will you then be handling the commands on another pass? Another risk you run is that sometimes the \
can denote a space or operator but you can test for that as well.
The way the filter pass currently works is by rebuilding the tokens array. If it sees big
or \
or left
etc. it just doesn't add it and continues. If it encounters a function that needs reformatting, it looks ahead and and adds it in the correct order on the new stack. If you find your idea easier then play around with it and see what results you get. If it works go for it.
I had an issue with nested commands, however that issue is now fixed, and I think I have a robust solution.
[in filterTokens]
if (typeof token.command === 'undefined' &&
i>0 && ['VARIABLE_OR_LITERAL', 'FUNCTION'].includes(token.type)) {
if (tokens[i-1].type === 'OPERATOR' && tokens[i-1].value === '\\') {
token.command = true;
} else {
token.command = false;
}
}
Let me know if you can see any issues.
A few things.
Once again did something funny with the last PR, so you got a bonus change.
~~The Travis CI error with the second commit seems to be something funny with Travis? I did the full test suite on my machine and none of the tests (except for the in-progress latex import functionality tests I added) failed.~~ Edit: Found some git-added text, removed it, everything fine now
Current State of Affairs
If you were beginning to think it looked like we were almost there:
If you (or anyone else who's interested - @Happypig375 ?) have any ideas for good test cases, please send me a PR at https://github.com/tecosaur/nerdamer/tree/dev and I'll be sure to include them :)