PySR icon indicating copy to clipboard operation
PySR copied to clipboard

[Feature] 1D integration mode

Open MilesCranmer opened this issue 2 years ago • 5 comments

Since the PySR backend now has fast autodiff set up, it seems like the obvious next step is to set up PySR to solve 1D integrals via genetic algorithms. I think the only change would be the loss function - which would use eval_diff_tree_array.

Perhaps it would make sense to create a separate class with a limited number of features, to host this functionality.

MilesCranmer avatar Jul 09 '22 16:07 MilesCranmer

Hello!

Am I right that this issue is about something like y = \int f(X(t)) dt where f(..) is some regression?

foxtran avatar Jan 01 '24 23:01 foxtran

Yep! Exactly.

MilesCranmer avatar Jan 02 '24 06:01 MilesCranmer

I think the only change would be the loss function - which would use eval_diff_tree_array.

From my point of view, it requires to update equation_search in SymbolicRegression.jl to accept X::AbstractVector{AbstractMatrix{T}} and Y::AbstractVector{AbstractMatrix{T}} with corresponding weights::AbstractVector{AbstractMatrix{T}} for numerical integration...

foxtran avatar Jan 02 '24 17:01 foxtran

Oh maybe you mean something different. The idea I had in mind can already be done with the current equation_search by passing in a custom loss function and flattened X array which is then unpacked by the custom loss function. For example, see the implementation in this thread: https://github.com/MilesCranmer/PySR/discussions/374. The idea for a 1D integration "mode" is to basically set this up automatically, somehow. Does that make more sense?

MilesCranmer avatar Jan 03 '24 06:01 MilesCranmer

Oh maybe you mean something different.

Probably... It is just from specific of my dataset in which I have a set of integrals which has different number of points.

At final, with custom loss function, I did something like this:

objective = #"""
using DataFrames

function my_custom_objective(tree, dataset::Dataset{T,L}, options) where {T,L}

    t = string(tree);
    if occursin("x1", t) || occursin("x2", t)
      return L(Inf)
    end

    point_pred, flag = eval_tree_array(tree, dataset.X, options)
    !flag && return L(Inf)

    diffs = dataset.X[2,:] .* (point_pred .- dataset.y)

    df = DataFrame(ID = dataset.X[1,:], D = diffs)
    res_df = combine(groupby(df, :ID), :D => sum)
    diffs = res_df[!, :D_sum]

    return sum(diffs .^ 2) / length(diffs)
end
#"""

In first two columns, I encoded ID of integrals and weights for numerical integration and I made an integration in such strange way via DafaFrames :) Unfortunately, batching does not work there.

foxtran avatar Jan 03 '24 11:01 foxtran