DimensionalData.jl
DimensionalData.jl copied to clipboard
Cool integration: DimensionalData and SymbolicRegression
I just used SymbolicRegression.jl to fit an equation to a DimArray. Specifically, given a dimarray with n axes, I am able to run SymbolicRegression via a wrapper to obtain an equation where the dims form the independent variables, and the values of the dimarray are the dependent variable.
Thought this would be cool for folks to see, and maybe we can get some form of more official interop :D
The code
using DimensionalData
import SymbolicRegression: SRRegressor
import MLJBase: machine, fit!
"""
slog(b, x)
Like `log(b, x)` but "safe" in that it will return NaN, not error, if either `b` or `x` are less than zero.
"""
slog(b, x) = (b > 0 && x > 0) ? log(b, x) : typeof(x)(NaN)
function best_symbolic_function(benchmark_dimarray; niterations = 50, binary_operators = [+, -, *, /, ^, slog], unary_operators = [exp,], kwargs...)
dt = DimTable(benchmark_dimarray)
X = NamedTuple{map(DimensionalData.name, DimensionalData.dims(benchmark_dimarray))}(DimensionalData.dimcolumns(dt) .|> x -> Float64.(x))
y = vec(benchmark_dimarray.data)
mach = machine(SRRegressor(;
niterations,
binary_operators,
unary_operators,
save_to_file = false,
), X, y)
fit!(mach)
return mach
return mach.report[:fit].equation_strings[r.best_idx]
end
julia> da2 = @d X(1:10:101) .* log.(Y(1:10:101))
┌ 11×11 DimArray{Float64, 2} ┐
├────────────────────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────── dims ┐
↓ X Sampled{Int64} 1:10:101 ForwardOrdered Regular Points,
→ Y Sampled{Int64} 1:10:101 ForwardOrdered Regular Points
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
↓ → 1 11 21 31 41 51 61 71 81 91 101
1 0.0 2.3979 3.04452 3.43399 3.71357 3.93183 4.11087 4.26268 4.39445 4.51086 4.61512
11 0.0 26.3768 33.4897 37.7739 40.8493 43.2501 45.2196 46.8895 48.3389 49.6195 50.7663
21 0.0 50.3558 63.935 72.1137 77.985 82.5683 86.3284 89.5163 92.2834 94.728 96.9175
31 0.0 74.3348 94.3802 106.454 115.121 121.887 127.437 132.143 136.228 139.837 143.069
41 0.0 98.3137 124.825 140.793 152.256 161.205 168.546 174.77 180.172 184.945 189.22
51 0.0 122.293 155.271 175.133 189.392 200.523 209.655 217.397 224.117 230.054 235.371
61 0.0 146.272 185.716 209.473 226.528 239.841 250.763 260.023 268.061 275.162 281.522
71 0.0 170.251 216.161 243.813 263.664 279.16 291.872 302.65 312.006 320.271 327.674
81 0.0 194.23 246.606 278.153 300.799 318.478 332.981 345.277 355.95 365.38 373.825
91 0.0 218.208 277.052 312.493 337.935 357.796 374.09 387.904 399.895 410.488 419.976
101 0.0 242.187 307.497 346.833 375.071 397.114 415.198 430.531 443.839 455.597 466.127
julia> mach2 = best_symbolic_function(da2)
[ Info: Training machine(SRRegressor(defaults = nothing, …), …).
[ Info: Started!
Evolving for 50 iterations... 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:03
[ Info: Final population:
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Complexity Loss Score Equation
1 1.807e+04 3.604e+01 y = 178.11
3 5.869e+03 5.622e-01 y = X * 3.4923
5 0.000e+00 1.802e+01 y = X * slog(2.7183, Y)
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
trained Machine; caches model-specific representations of data
model: SRRegressor(defaults = nothing, …)
args:
1: Source @481 ⏎ Table{AbstractVector{Continuous}}
2: Source @222 ⏎ AbstractVector{Continuous}
This is really cool!