SymbolicRegression.jl
SymbolicRegression.jl copied to clipboard
findFormula function
I do not know symbolic regression in detail however I find it very useful. It would be nice if there was a function like:
findFunction(x, y) # where x and y are vectors and returns a SymbolicUtils/Symbolics expression
So that someone who has two vectors and wants a formula could use this package without knowing the details.
How about this?
using SymbolicRegression
function find_expression(X::AbstractArray{T,2}, y::AbstractArray{T,1}; kwargs...) where T<:Real
options = Options(;kwargs...)
hall_of_fame = EquationSearch(X, y, niterations=40, options=options, multithreading=true)
dominating = calculate_pareto_frontier(X, y, hall_of_fame, options)
eqn = node_to_symbolic(dominating[end].tree, options)
return eqn
end
You can run it with, e.g.,
X = randn(Float64, 5, 100)
y = cos.(X[2, :] .* X[3, :] .* X[1, :])
eqn = find_expression(X, y, unary_operators=(cos,))
if that works for you, I can add it. Just make sure to start julia with --threads=auto so that it can take advantage of all your cores.
Is there's a cleaner way to pass parameters for Options and EquationSearch at the same time? Right now this only lets you configure Options.
I am not an experienced programmer but my attempt is:
function findFunction(
X::AbstractVecOrMat{T}, y::AbstractVector{T}; # X takes both a vector and a matrix
binary_op=(+, -, *, /, ^), unary_op=(sin, cos, tan, exp), n_pop=20,
n_iter=40, multi_threading=true # add other parameters
) where T<:Real
@assert !isempty(X) "X is empty"
x = X isa Vector ? reshape(X, 1, length(X)) : X
options = Options(binary_operators=binary_op, unary_operators=unary_op, npopulations=n_pop)
hall_of_fame = EquationSearch(x, y, niterations=n_iter, options=options, multithreading=multi_threading)
dominating = calculate_pareto_frontier(x, y, hall_of_fame, options)
X isa Vector ? node_to_symbolic(dominating[end].tree, options, varMap=["x"]) : # if there is only one variable it returns "x" instead of "x1"
node_to_symbolic(dominating[end].tree, options)
end
problems: 1)I don't know which unary and binary operators, npopulations and niterations to put by default. You should choose reasonable parameters. 2)I noticed that the node_to_symbolic function returns a LiteralReal. For example, simplify(x1^1) does not work. 3)Perhaps it would be nice to have a function similar to this one that returns a callable julia function (after simplify(expr)).
I asked on Discourse and you can pass parameters for Options and EquationSearch in this way:
function findFunction(X::AbstractVecOrMat{T}, y::AbstractVector{T};
options_args=(binary_operators=(+, -, *, /, ^), unary_operators=(sin, cos, exp), npopulations=20),
equations_search_args=(niterations=40, multithreading=true)
) where T<:Real
@assert !isempty(X) "X is empty"
x = X isa Vector ? reshape(X, 1, length(X)) : X
options = Options(;options_args...)
hall_of_fame = EquationSearch(x, y; options=options, equations_search_args...)
dominating = calculate_pareto_frontier(x, y, hall_of_fame, options)
node_to_symbolic(dominating[end].tree, options)
end
This works:
findFunction(a, b) # a::Vector, b::Vector
findFunction(reshape(a, 1, length(a)), b) # Matrix, b::Vector
findFunction(a, b; options_args=(binary_operators=(+, -, *, /, ^), unary_operators=(sin, cos), npopulations=10),
equations_search_args=(niterations=10, multithreading=false)) # a::Vector, b::Vector, arguments for Options and EquationSearch