MLJ.jl
MLJ.jl copied to clipboard
Proposed model
Suppose I got a new dataset in the mail today & wanna see which brand-name distribution in Distributions.jl best fits it.
using Distributions, Random, HypothesisTests;
Uni = subtypes(UnivariateDistribution)
#Cts_Uni = subtypes(ContinuousUnivariateDistribution)
DGP_True = LogNormal(17,7);
Random.seed!(123);
const d_train = rand(DGP_True, 1_000)
const d_test = rand(DGP_True, 1_000)
Er =[]; D_fit =[];
for d in Uni
println(d)
try
dd = "$(d)" |> Meta.parse |> eval
D̂ = fit(dd, d_train)
Score = [loglikelihood(D̂, d_test),
OneSampleADTest(d_test, D̂) |> pvalue,
ApproximateOneSampleKSTest(d_test, D̂) |> pvalue,
ExactOneSampleKSTest(d_test, D̂) |> pvalue,
#PowerDivergenceTest(d_test,lambda=1) Not working!!!
JarqueBeraTest(d_test) |> pvalue #Only Normal
];
#Score = loglikelihood(D̂, ds) #TODO: compute a better score.
push!(D_fit, [d, D̂, Score])
catch e
println(e, d)
push!(Er, (d,e))
end
end
a=hcat(D_fit...)
M_names = a[1,:]; M_fit = a[2,:]; M_scores = a[3,:];
idx =sortperm(M_scores, rev=true);
Dfit_sort=hcat(M_names[idx], sort(M_scores, rev=true) )
julia> Dfit_sort
11×3 Array{Any,2}:
LogNormal … [-20600.7, 0.823809, 0.789128, 0.781033, 0.0]
Gamma [-21159.4, 6.0e-7, 2.45426e-68, 1.23247e-69, 0.0]
Cauchy [-24823.3, 6.0e-7, 2.91142e-213, 8.6107e-227, 0.0]
InverseGaussian [-26918.1, 6.0e-7, 0.0, 0.0, 0.0]
Exponential [-33380.3, 6.0e-7, 0.0, 0.0, 0.0]
Normal … [-40611.5, 6.0e-7, 1.32495e-213, 3.51792e-227, 0.0]
Rayleigh [-61404.6, 6.0e-7, 0.0, 0.0, 0.0]
Laplace [-2.03419e9, 6.0e-7, 1.49234e-138, 5.47197e-144, 0.0]
DiscreteNonParametric [-Inf, 6.0e-7, 0.197933, 0.193494, 0.0]
Pareto [-Inf, 6.0e-7, 6.69184e-108, 3.7704e-111, 0.0]
Uniform … [-Inf, 6.0e-7, 0.0, 0.0, 0.0]
Basically this is predicting Y given X=constant, except the prediction here is not a number but a (unconditional) distribution.
In MLJ the plan is to view fitting a distribution as probabilistic supervised learning where the input is X=nothing - a single point with no information. The data you have above would be the target, labelled y, and the prediction yhat is a single (probabilistic) prediction. The API is set up for this already - see https://alan-turing-institute.github.io/MLJ.jl/dev/adding_models_for_general_use/#Models-that-learn-a-probability-distribution-1 - but no-one has contributed a model yet.
Is this what you are after?
Yes, that is.
Btw, this case w/ X=nothing can be generalized.
For example: y= x*\beta + e, where e ~_{iid} F(\theta) for a large class of probability distributions for the error term.