PySR
PySR copied to clipboard
Understanding accuracy of expressions
hello! In my recent research, I used pysr to do some symbolic regression tasks. I found that pysr 's loss is even smaller than ANN in some cases. How can I explain this magic of pysr ? Why is the result of low-dimensional expressions better than high-dimensional networks ? Thanks!
Hi @prozhuyi,
Thanks for this. Yes I also find sometimes symbolic expressions beat neural nets for specific problems. It really has to do with priors over the space of functions. When you train a neural net, there is an implicit prior that the function will be smooth and other properties.
Symbolic regression imposes a different prior over the space of functions. Sometimes you will have that this prior is superior to the neural net prior, especially if the operators you are using are an efficient basis for describing your field.
cheers, Miles
I seem to understand ! Thank you for your answer!
hello Miles @MilesCranmer
I recently had another question, which is the ' score ' given by pysr. How is this ' score ' obtained ? Is it obtained by this step ?
`if lastMSE is None:
cur_score = 0.0
else:
if curMSE > 0.0:
# TODO Move this to more obvious function/file.
cur_score = -np.log(curMSE / lastMSE) / (curComplexity - lastComplexity)
else:
cur_score = np.inf`
and what is its significance ? thanks again!
Yes, that is the score. It basically is a heuristic that looks for sharp decreases in loss when increasing complexity (traditional metric for "best" equation in SR). There are more details on this in the PySR paper: https://arxiv.org/abs/2305.01582
Hi @MilesCranmer ,
It is a very interesting discussion. Just elaborating your answer a little more and correct me if I am wrong:
The ANN assumes a prior over the space of smooth (and other properties) functions whereas Symbolic Regression can allow non-smooth functions as well, which sometimes can be a more suitable prior for a particular problem.
Is the above statement correct?