MLJ.jl
MLJ.jl copied to clipboard
Storing intermediate results of a Composite Model
Hi!
Is your feature request related to a problem? Please describe.
I am trying to use the learning network API and would like to store additional results in the fitresult
of my composite model. Could you provide some guidance on how to do this properly?
Describe the solution you'd like Ideally I'd like to be able to store the value of any node that was computed at training time.
Describe alternatives you've considered
It seems that only the submodels fitresults
are natively stored, so one way to do it I guess would to define some kind of ResultModel
as a submodel for whatever value I would like and compute the result in the fit! function of this model.
Additional context
I must add that the learning network I am trying to build is not regular in that it will never be used for prediction however I feel that what I'm trying to do may be of general use in MLJ.
For instance, the following works fine except that I can't retrieve the value of the final node because of the anonymization in the return!
. Moreover I don't think this is appropriate as I guess all computations (except fitting) would be made again each time I call the node right?
using MLJ
LinearRegressor = @load LinearRegressor pkg=MLJLinearModels verbosity = 0
mutable struct MyModel <: MLJ.DeterministicComposite
model
end
function MLJ.fit(m::MyModel, verbosity, X, y)
Xs = source(X)
ys = source(y)
mach = machine(m.model, Xs, ys)
ypred = MLJ.predict(mach, Xs)
μpred = node(x->mean(x), ypred)
σpred = node((x, μ)->mean((x.-μ).^2), ypred, μpred)
mach = machine(Deterministic(), Xs, ys; predict=σpred)
fitresult, cache, report = return!(mach, m, verbosity)
mach.fitresult = (σpred=σpred, fitresult...)
return mach.fitresult, cache, report
end
X, y = make_regression(500, 5)
mach = machine(MyModel(LinearRegressor()), X, y)
fit!(mach)
fitted_params(mach)
mach.fitresult.σpred()
@olivierlabayle Thanks for raising this interesting question about creating new interface points for composite models.
Of course, if you are not interested in the ordinary predict
output, you could just define the predict node to be σpred
with return!(machine(Xs, ys, predict=σpred), model, verbosity)
. But I don't think that is what you are getting at right? You are looking to add ways of accessing information in addition to what can be extracted from predict
and transform
(you can already define both, incidentally).
But I am interested in clarifying exactly what you want here. I see that there are two possible objectives here. Do you want the output of σpred
on the training data, to be recorded somehow in the report or fitted params or are effectively seeking to add a new operation that can be called on new data, like a predict
operation? That is, are we trying to record extra data as a bi-product of training, or do we want to add extra functions that dispatch on both new data and the outcomes of training?
@ablaom Thanks for getting back at me so quickly.
I am working in causality, which means that my scenario differs from the traditional MLJ framework in the following ways:
- I don't have data as (X, y) but rather (X, W, y).
- I don't really have a
predict
time. In MLJ the learning algorithm outputs a prediction function, while I am only interested in outputing a real number (or vector).
The reason why I am so interested in the learning network API is that I think it provides a nice caching and scheduling mechanism. For instance, again in my use case, I might want to change one hyperparameter of model3 (see below) so that the whole procedure will not refit model1 and model2 because their upstream has not changed.
To cut it short, I think using the predict
node (or more reasonably defining a new operation node) might work for me (as in the following ) but I don't want the computations to happen twice. Moreover this currently doesn't work because predict
expects the data to be (X, y). The other solution would be to record some state of information at fit time as you mention, it seems both more appropriate for my use case and still useful fo general MLJ users (For instance I initially wanted to report the scores of the learners in the Stack). For general MLJ users that would be in addition to the predict
function and for me it would be all I require.
Hope this helps!
using MLJ
LinearRegressor = @load LinearRegressor pkg=MLJLinearModels verbosity = 0
mutable struct MyModel <: MLJ.DeterministicComposite
model1
model2
model3
end
function MLJ.fit(m::MyModel, verbosity, X, W, y)
Xs = source(X)
Ws = source(W)
ys = source(y)
mach1 = machine(m.model1, Xs, ys)
mach2 = machine(m.model2, Ws, ys)
ypred1 = MLJ.predict(mach1, Xs)
ypred2 = MLJ.predict(mach2, Ws)
Y = hcat(ypred1, ypred2)
mach3 = machine(m.model3, Y, ys)
ypred3 = MLJ.predict(mach3, Y)
μpred = node(x->mean(x), ypred3)
σpred = node((x, μ)->mean((x.-μ).^2), ypred3, μpred)
estimate = node((μ, σ2)->(μ, σ2), μpred, σpred)
mach = machine(Deterministic(), Xs, ys; predict=estimate)
return!(mach, m, verbosity)
end
X, y = make_regression(500, 5)
model = MyModel(LinearRegressor(), LinearRegressor(), LinearRegressor())
mach = machine(model, X, X, y)
fit!(mach)
estimate = MLJ.predict(mach)
@olivierlabayle I've played around with this a bit today and will get your feedback on one experiment in the next day or so.
@ablaom That's great, very happy to hear that, thanks a lot!
@olivierlabayle Please have a look at https://github.com/JuliaAI/MLJBase.jl/pull/644 which addresses the original suggestion and give me your feedback.
I think in the immediate term causal inference with targeted learning is out-of-scope. My focus for the next few months will be moving towards version 1.0.
Perhaps you can hack around the other obstacles for now, eg by exporting a predict node that you have no intention of using.
You might also want to conceptualise your model as a transformer with a single tuple (X, W, y)
as input, which you split up.
Yes I understand and I wasn't planning on having a dedicated MLJ structure for this. As you say, I will be hacking a bit, for now it's a mode with unused predict node but I like the transformer idea. I think with this pull request I should be good to go and benefit from the learning network machinery.
Wouldn't it be more intuitive/self-explanatory to add a report
kwarg to the surrogate machine call that takes a named tuple input? It would also allow fitted_params
if it's necessary at some point in the future.
mach = machine(Deterministic(), Xs, ys; predict=ypred3, μpred=μpred, σpred=σpred)
would become
mach = machine(Deterministic(), Xs, ys; predict=ypred3, report=(μpred=μpred, σpred=σpred))
@ablaom I also stumbled over this issue while implementing composite detectors, which should store training scores in the report for the composite model.
@davnn Thanks for chiming in here.
I also thought of this, but it seemed a bit more complicated. But yes, as you say this may be "more intuitive/self-explanatory". I should be happy to make that change.
which should store training scores in the report for the composite model.
Ah, yes, I can imagine that could be so. Does this mean we need to expedite this somewhat? Currently this is low on my priorities as I am swamped with other stuff.
Ah, yes, I can imagine that could be so. Does this mean we need to expedite this somewhat? Currently this is low on my priorities as I am swamped with other stuff.
Nope, consider it low prio as well, just using a custom return!
for now.
@ablaom Thank you for managing to do this feature!
I'm having a difficult time converting my custom return!
to the new MLJ API (added in https://github.com/JuliaAI/MLJBase.jl/pull/644). Previously, I could just use
function return_with_scores!(network_mach, model, verbosity, scores_train, X)
fitresult, cache, report = MLJ.return!(network_mach, model, verbosity)
report = merge(report, (scores=scores_train(X),))
return fitresult, cache, report
end
instead of return!
to add a scores
field to the report named tuple. Using the same function with the new MLJ API results in a report = (..., additions = (scores = [1,2,3],...))
, which means that there is no longer a unified API (between composite and individual models) to access the training scores. I would now have to check everywhere if the model is a composite and use report.additions.scores
, or is there a better solution?
@davnn Good point. I suggest we add a raw_training_scores
accessor function as suggested in the tracking issue cross-referenced above.
What do you think?
Thank you for your detailed thoughts on how we could go forward. I need some more time to think about it. I'm a bit afraid of feature creep in MLJ, but maybe that's not a big problem.
Alternatively, we could introduce more generic accessor functions, training_predictions(model, fitresult, report)
and training_transformations(model, fitresult, report)
which, when implemented, are syntactically equivalent to predict(model, fitresult, Xtrain)
and transform(model, fitresult, Xtrain)
but more efficient, because they just extract data pre-computed at fit
time (and available in fitresult
or report
)? Mmm, might be a bit abstract for users?
In your use case, you overload training_transformations
to return training raw scores for all detectors: for regular detectors, this is report.scores
(or whatever - I forget what you call them) and for composite models it's report.additions.scores
.
I would prefer to keep the API simple with a report
that can flexibly accommodate predictions, transformations or whatever the algorithm could produce. Strangely enough, predict(model, fitresult, Xtrain)
would NOT result in the training scores observed during fit for neighbor-based methods, because predict
would compare the points in Xtrain
to Xtrain
, but fit
ignores the first (trivial) neighbor.
It might make sense to follow the uniform access principle for things like the models' report, i.e. discourage or even disallow direct access to model intrinsics such as model.report
and encourage report(model)
, which could be easily customized on a per-model basis to return any custom format.
Thanks for these points. I have some ideas about how to do this properly (and also how to greatly simplify the learning networks "export" process) but it's going to take a little time. I will keep you posted, and I appreciate your patience.