SymbolicRegression.jl icon indicating copy to clipboard operation
SymbolicRegression.jl copied to clipboard

findFormula function

Open qwertyjl opened this issue 3 years ago • 3 comments
trafficstars

I do not know symbolic regression in detail however I find it very useful. It would be nice if there was a function like:

findFunction(x, y) # where x and y are vectors and returns a SymbolicUtils/Symbolics expression

So that someone who has two vectors and wants a formula could use this package without knowing the details.

qwertyjl avatar Sep 08 '22 12:09 qwertyjl

How about this?

using SymbolicRegression

function find_expression(X::AbstractArray{T,2}, y::AbstractArray{T,1}; kwargs...) where T<:Real
    options = Options(;kwargs...)
    hall_of_fame = EquationSearch(X, y, niterations=40, options=options, multithreading=true)
    dominating = calculate_pareto_frontier(X, y, hall_of_fame, options)
    eqn = node_to_symbolic(dominating[end].tree, options)
    return eqn
end

You can run it with, e.g.,

X = randn(Float64, 5, 100)
y = cos.(X[2, :] .* X[3, :] .* X[1, :])
eqn = find_expression(X, y, unary_operators=(cos,))

if that works for you, I can add it. Just make sure to start julia with --threads=auto so that it can take advantage of all your cores.

Is there's a cleaner way to pass parameters for Options and EquationSearch at the same time? Right now this only lets you configure Options.

MilesCranmer avatar Sep 08 '22 13:09 MilesCranmer

I am not an experienced programmer but my attempt is:

function findFunction(
    X::AbstractVecOrMat{T}, y::AbstractVector{T};    # X takes both a vector and a matrix 
    binary_op=(+, -, *, /, ^), unary_op=(sin, cos, tan, exp), n_pop=20, 
    n_iter=40, multi_threading=true        # add other parameters
) where T<:Real

    @assert !isempty(X) "X is empty"

    x = X isa Vector ? reshape(X, 1, length(X)) : X
    options = Options(binary_operators=binary_op, unary_operators=unary_op, npopulations=n_pop)
    hall_of_fame = EquationSearch(x, y, niterations=n_iter, options=options, multithreading=multi_threading)
    dominating = calculate_pareto_frontier(x, y, hall_of_fame, options)
    X isa Vector ? node_to_symbolic(dominating[end].tree, options, varMap=["x"]) : # if there is only one variable it returns "x" instead of "x1"
                   node_to_symbolic(dominating[end].tree, options)
end

problems: 1)I don't know which unary and binary operators, npopulations and niterations to put by default. You should choose reasonable parameters. 2)I noticed that the node_to_symbolic function returns a LiteralReal. For example, simplify(x1^1) does not work. 3)Perhaps it would be nice to have a function similar to this one that returns a callable julia function (after simplify(expr)).

qwertyjl avatar Sep 08 '22 16:09 qwertyjl

I asked on Discourse and you can pass parameters for Options and EquationSearch in this way:

function findFunction(X::AbstractVecOrMat{T}, y::AbstractVector{T}; 
    options_args=(binary_operators=(+, -, *, /, ^), unary_operators=(sin, cos, exp), npopulations=20), 
    equations_search_args=(niterations=40, multithreading=true)
) where T<:Real
    @assert !isempty(X) "X is empty"

    x = X isa Vector ? reshape(X, 1, length(X)) : X
    options = Options(;options_args...)
    hall_of_fame = EquationSearch(x, y; options=options, equations_search_args...)
    dominating = calculate_pareto_frontier(x, y, hall_of_fame, options)
    node_to_symbolic(dominating[end].tree, options)
end

This works:

findFunction(a, b) # a::Vector, b::Vector
findFunction(reshape(a, 1, length(a)), b) # Matrix, b::Vector
findFunction(a, b; options_args=(binary_operators=(+, -, *, /, ^), unary_operators=(sin, cos), npopulations=10),
                  equations_search_args=(niterations=10, multithreading=false)) # a::Vector, b::Vector, arguments for Options and EquationSearch

qwertyjl avatar Sep 09 '22 09:09 qwertyjl