Mathics
Mathics copied to clipboard
Evaluation cache
Here is a proposal for implementing a cache expression. This should reduce the evaluation time for expressions that are evaluated many times. The idea is to keep a dictionary inside the Evaluation object, whose keys are the hashes of the expressions. The dict stores as values both the input as the output expression. Also, the implementation handles the caching of subexpressions.
@mmatera:
There is something I broke in the scanner which I think will cause all tests to fail with something like:
self = <test.test_parser.test_parser.GeneralTests testMethod=testFunction>
def testFunction(self):
self.check("x &", Node("Function", Symbol("x")))
> self.check("x \\[Function] y", "Function[x, y]")
test/test_parser/test_parser.py:240:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test/test_parser/test_parser.py:29: in check
expr1 = self.parse(expr1)
test/test_parser/test_parser.py:25: in parse
return self.parser.parse(SingleLineFeeder(s))
mathics/core/parser/parser.py:66: in parse
return self.parse_e()
mathics/core/parser/parser.py:103: in parse_e
result.append(self.parse_exp(0))
mathics/core/parser/parser.py:133: in parse_exp
child = self.parse_exp(q + 1)
mathics/core/parser/parser.py:112: in parse_exp
result = self.parse_p()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <mathics.core.parser.parser.Parser object at 0x12e9fe710>
def parse_p(self):
token = self.next_noend()
tag = token.tag
method = getattr(self, "p_" + tag, None)
if method is not None:
return method(token)
elif tag in prefix_ops:
self.consume()
q = prefix_ops[tag]
child = self.parse_exp(q)
return Node(tag, child)
else:
self.tokeniser.sntx_message(token.pos)
> raise InvalidSyntaxError()
E mathics_scanner.errors.InvalidSyntaxError
mathics/core/parser/parser.py:156: InvalidSyntaxError
=========================== short test summary info ============================
FAILED test/test_parser/test_parser.py::GeneralTests::testFunction - mathics_...
============= 1 failed, 253 passed, 8 xfailed in 115.78s (0:01:55) =============
I am looking at this now. Some short-term solutions are to ignore this test here. I suspect also in CI we could go back to an older version of the scanner but this is tricky in that there was a bug in adding \[IndentingNewline]
and handling that properly in mathicsscript.
@mmatera:
There is something I broke in the scanner which I think will cause all tests to fail with something like:
self = <test.test_parser.test_parser.GeneralTests testMethod=testFunction> def testFunction(self): self.check("x &", Node("Function", Symbol("x"))) > self.check("x \\[Function] y", "Function[x, y]") test/test_parser/test_parser.py:240: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ test/test_parser/test_parser.py:29: in check expr1 = self.parse(expr1) test/test_parser/test_parser.py:25: in parse return self.parser.parse(SingleLineFeeder(s)) mathics/core/parser/parser.py:66: in parse return self.parse_e() mathics/core/parser/parser.py:103: in parse_e result.append(self.parse_exp(0)) mathics/core/parser/parser.py:133: in parse_exp child = self.parse_exp(q + 1) mathics/core/parser/parser.py:112: in parse_exp result = self.parse_p() _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <mathics.core.parser.parser.Parser object at 0x12e9fe710> def parse_p(self): token = self.next_noend() tag = token.tag method = getattr(self, "p_" + tag, None) if method is not None: return method(token) elif tag in prefix_ops: self.consume() q = prefix_ops[tag] child = self.parse_exp(q) return Node(tag, child) else: self.tokeniser.sntx_message(token.pos) > raise InvalidSyntaxError() E mathics_scanner.errors.InvalidSyntaxError mathics/core/parser/parser.py:156: InvalidSyntaxError =========================== short test summary info ============================ FAILED test/test_parser/test_parser.py::GeneralTests::testFunction - mathics_... ============= 1 failed, 253 passed, 8 xfailed in 115.78s (0:01:55) =============
I am looking at this now. Some short-term solutions are to ignore this test here. I suspect also in CI we could go back to an older version of the scanner but this is tricky in that there was a bug in adding
\[IndentingNewline]
and handling that properly in mathicsscript.
@mmatera For now I have pinned in master Mathics_Scanner at 1.0.0 which should work for now. When that is in you might have to rebase or merge your PR
Before undertaking such a pervasive change, it would be interesting to get a rough idea of how much improvement we are going to get.
For example a maybe come up with some scenarios based in specific examples and show how much time gain we get using this.
This is an experiment: I know that WMA implements something similar, and the reason is that this helps to avoid keeping many Expressions identical expression objects, as well as reducing the time used in applying replacing rules over the same expression. Also, this could help to reduce the time doing comparisons over identical objects.
My implementation is incomplete (apart from the obvious thing that is not working) because I didn't think about a mechanism to decide when to remove elements from the cache, so even if this passes the tests, I shouldn't merge this.
Before undertaking such a pervasive change, it would be interesting to get a rough idea of how much improvement we are going to get. For example a maybe come up with some scenarios based in specific examples and show how much time gain we get using this.
This is an experiment: I know that WMA implements something similar, and the reason is that this helps to avoid keeping many Expressions identical expression objects, as well as reducing the time used in applying replacing rules over the same expression. Also, this could help to reduce the time doing comparisons over identical objects.
My implementation is incomplete (apart from the obvious thing that is not working) because I didn't think about a mechanism to decide when to remove elements from the cache, so even if this passes the tests, I shouldn't merge this.
Ok - thanks for the explanations and clarifications.
This is a result of one possible experiment:
In[1]:= Timing[N[Pi,100];]
Out[1]= {0.013185, Null}
In[2]:= Timing[N[Pi,100];]
Out[2]= {0.000101354, Null}
In[3]:= Timing[Table[N[Pi^k,100],{10}];]
Out[3]= {0.0105746, Null}
In[4]:= Timing[Table[N[Pi^k,100],{10}];]
Out[4]= {0.000143036, Null}
This is a result of one possible experiment:
In[1]:= Timing[N[Pi,100];] Out[1]= {0.013185, Null} In[2]:= Timing[N[Pi,100];] Out[2]= {0.000101354, Null} In[3]:= Timing[Table[N[Pi^k,100],{10}];] Out[3]= {0.0105746, Null} In[4]:= Timing[Table[N[Pi^k,100],{10}];] Out[4]= {0.000143036, Null}
Very nice.
The concept is good. We should huddle over the implementation strategies though.
Also, I should mention that next weekend I will start looking at comparisons and try to come up with a slightly different organization of the code, keeping the essential logic inside the comparisons the same.
@rocky and @mmatera what still needs to be done here? If it still needs work, I have an interest.
I can try to do the rebase. In any case, this was an experiment.
I can try to do the rebase.
Ok.
In any case, this was an experiment.
So a very performant experiment!
I'd like to look the whole performance issue in detail sometime. The last time I looked at this, it didn't felt more like a shot in the dark rather than looking at what's slow and attacking that.
I'd like to look the whole performance issue in detail sometime. The last time I looked at this, it didn't felt more like a shot in the dark rather than looking at what's slow and attacking that.
I think that the problem is that there is not a single performance issue, but several ones. This PR tries to implement something that the is in WMA: the expression cache. So, if you want to evaluate many times the same complicate expression, which always produces the same result, the best is to store it and reuse it. This does not resolve other huge performance issue: the expression matching is quite slow. But OK, let's keep with baby steps...
It's not that I don't think an expression cache will help. I in fact do think so. However the implementation that I saw previously might have been a bit lacking. Ideally we'd come up with a bunch of ideas at a high level, kick those around and then drill down from there.
Although many things I do them incrementally, for some things, I don't think they should be done as incremental improvements. Like come up with some kind of cache. See if you can improve it a little, and so on.
Up front, various performance measurements and estimates should be done.
@rocky and @mmatera what do you think should be done next for performance? You already mentioned that expression matching is slow, but what can be done to increase its performance?
@rocky and @mmatera what do you think should be done next for performance?
Get performance benchmarks and pinpoint exactly where things are slow and figure figure out why. You can use this draft to see where it lies in speeding things up.
As an example, build this with Pyston and run the unit tests. I got a 30% speedup which is what the Pyston folks said to expect. Try running it on the doctests. Is the speedup the same? Try using this draft to see what the speedup is.
What is the speedup using PyPy 3.7?
But please, if you can't do this on your own, leave it. There are many many many other things you can do totally on your own.