SymbolicRegression.jl
SymbolicRegression.jl copied to clipboard
Evaluating trees with features not in dataset
Hi, im working on a process that uses the equation returned in the Pareto Frontier.
I was playing around with it and found the following:
Setting a tree like:
and evaluating with a dataset with 6 features, gives no problem, as expected.

but if i create a tree that uses more features than in the original dataset like:

I expected an error, but the function eval_tree_array gave an output and completion ==true.
I've realized this will not impact what I'm doing as trees made by the program will never (right?) have more features than the dataset, but I suppose its interesting enough of a bug to share.
Hi @Jgmedina95,
Thanks for raising this! So this should be because @inbounds is used throughout the evaluation to have faster compute. I make an in-bounds assumption since trees made during the search should never have different features than available in the dataset. However, as you raise, I don't think it hurts to check before running the evaluation, so maybe we could do that. (it could be that the user could save their search state, then re-run with a different dataset, and be surprised when trees access undefined memory?)
I just looked at the code, and it seems like sometimes @inbounds is actually not used, such as:
https://github.com/MilesCranmer/SymbolicRegression.jl/blob/6075f13c3e8fbb8b16686a6f7c1157f0235174ee/src/EvaluateEquation.jl#L163
Whereas, @inbounds is used in the functions which fuse operators together, like here:
https://github.com/MilesCranmer/SymbolicRegression.jl/blob/6075f13c3e8fbb8b16686a6f7c1157f0235174ee/src/EvaluateEquation.jl#L189
So perhaps for some trees, this would raise an error, and sometimes it would not. Will think more whether we should add a bounds check.
Best, Miles