Cool integration: DimensionalData and SymbolicRegression

Open asinghvi17 opened this issue 9 months ago • 1 comments

I just used SymbolicRegression.jl to fit an equation to a DimArray. Specifically, given a dimarray with n axes, I am able to run SymbolicRegression via a wrapper to obtain an equation where the dims form the independent variables, and the values of the dimarray are the dependent variable.

Thought this would be cool for folks to see, and maybe we can get some form of more official interop :D

The code

using DimensionalData

import SymbolicRegression: SRRegressor
import MLJBase: machine, fit!

"""
    slog(b, x)

Like `log(b, x)` but "safe" in that it will return NaN, not error, if either `b` or `x` are less than zero.
"""
slog(b, x) = (b > 0 && x > 0) ? log(b, x) : typeof(x)(NaN)

function best_symbolic_function(benchmark_dimarray; niterations = 50, binary_operators = [+, -, *, /, ^, slog], unary_operators = [exp,], kwargs...)
    dt = DimTable(benchmark_dimarray)
    X = NamedTuple{map(DimensionalData.name, DimensionalData.dims(benchmark_dimarray))}(DimensionalData.dimcolumns(dt) .|> x -> Float64.(x))
    y = vec(benchmark_dimarray.data)
    mach = machine(SRRegressor(;
        niterations,
        binary_operators,
        unary_operators,
        save_to_file = false,
    ), X, y)
    fit!(mach)
    return mach
    return mach.report[:fit].equation_strings[r.best_idx]
end

julia> da2 = @d X(1:10:101) .* log.(Y(1:10:101))
┌ 11×11 DimArray{Float64, 2} ┐
├────────────────────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────── dims ┐
  ↓ X Sampled{Int64} 1:10:101 ForwardOrdered Regular Points,
  → Y Sampled{Int64} 1:10:101 ForwardOrdered Regular Points
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
   ↓ →  1     11        21         31         41         51         61         71         81         91        101
   1    0.0    2.3979    3.04452    3.43399    3.71357    3.93183    4.11087    4.26268    4.39445    4.51086    4.61512
  11    0.0   26.3768   33.4897    37.7739    40.8493    43.2501    45.2196    46.8895    48.3389    49.6195    50.7663
  21    0.0   50.3558   63.935     72.1137    77.985     82.5683    86.3284    89.5163    92.2834    94.728     96.9175
  31    0.0   74.3348   94.3802   106.454    115.121    121.887    127.437    132.143    136.228    139.837    143.069
  41    0.0   98.3137  124.825    140.793    152.256    161.205    168.546    174.77     180.172    184.945    189.22
  51    0.0  122.293   155.271    175.133    189.392    200.523    209.655    217.397    224.117    230.054    235.371
  61    0.0  146.272   185.716    209.473    226.528    239.841    250.763    260.023    268.061    275.162    281.522
  71    0.0  170.251   216.161    243.813    263.664    279.16     291.872    302.65     312.006    320.271    327.674
  81    0.0  194.23    246.606    278.153    300.799    318.478    332.981    345.277    355.95     365.38     373.825
  91    0.0  218.208   277.052    312.493    337.935    357.796    374.09     387.904    399.895    410.488    419.976
 101    0.0  242.187   307.497    346.833    375.071    397.114    415.198    430.531    443.839    455.597    466.127

julia> mach2 = best_symbolic_function(da2)
[ Info: Training machine(SRRegressor(defaults = nothing, …), …).
[ Info: Started!
Evolving for 50 iterations... 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:03
[ Info: Final population:
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Complexity  Loss       Score      Equation
1           1.807e+04  3.604e+01  y = 178.11
3           5.869e+03  5.622e-01  y = X * 3.4923
5           0.000e+00  1.802e+01  y = X * slog(2.7183, Y)
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
trained Machine; caches model-specific representations of data
  model: SRRegressor(defaults = nothing, …)
  args:
    1:	Source @481 ⏎ Table{AbstractVector{Continuous}}
    2:	Source @222 ⏎ AbstractVector{Continuous}

Mar 04 '25 16:03 asinghvi17

This is really cool!

Apr 17 '25 18:04 alex-s-gardner