PySR icon indicating copy to clipboard operation
PySR copied to clipboard

Understanding accuracy of expressions

Open zhuyi-bjut opened this issue 1 year ago • 5 comments

hello! In my recent research, I used pysr to do some symbolic regression tasks. I found that pysr 's loss is even smaller than ANN in some cases. How can I explain this magic of pysr ? Why is the result of low-dimensional expressions better than high-dimensional networks ? Thanks!

zhuyi-bjut avatar Nov 14 '23 05:11 zhuyi-bjut

Hi @prozhuyi,

Thanks for this. Yes I also find sometimes symbolic expressions beat neural nets for specific problems. It really has to do with priors over the space of functions. When you train a neural net, there is an implicit prior that the function will be smooth and other properties.

Symbolic regression imposes a different prior over the space of functions. Sometimes you will have that this prior is superior to the neural net prior, especially if the operators you are using are an efficient basis for describing your field.

cheers, Miles

MilesCranmer avatar Nov 14 '23 08:11 MilesCranmer

I seem to understand ! Thank you for your answer!

zhuyi-bjut avatar Nov 14 '23 08:11 zhuyi-bjut

hello Miles @MilesCranmer

I recently had another question, which is the ' score ' given by pysr. How is this ' score ' obtained ? Is it obtained by this step ?

           `if lastMSE is None:
                cur_score = 0.0
            else:
                if curMSE > 0.0:
                    # TODO Move this to more obvious function/file.
                    cur_score = -np.log(curMSE / lastMSE) / (curComplexity - lastComplexity)
                else:
                    cur_score = np.inf`

and what is its significance ? thanks again!

zhuyi-bjut avatar Nov 16 '23 15:11 zhuyi-bjut

Yes, that is the score. It basically is a heuristic that looks for sharp decreases in loss when increasing complexity (traditional metric for "best" equation in SR). There are more details on this in the PySR paper: https://arxiv.org/abs/2305.01582

MilesCranmer avatar Nov 16 '23 22:11 MilesCranmer

Hi @MilesCranmer ,

It is a very interesting discussion. Just elaborating your answer a little more and correct me if I am wrong:

The ANN assumes a prior over the space of smooth (and other properties) functions whereas Symbolic Regression can allow non-smooth functions as well, which sometimes can be a more suitable prior for a particular problem.

Is the above statement correct?

tanweer-mahdi avatar Nov 26 '23 23:11 tanweer-mahdi