MLJ.jl icon indicating copy to clipboard operation
MLJ.jl copied to clipboard

Storing intermediate results of a Composite Model

Open olivierlabayle opened this issue 3 years ago • 16 comments

Hi!

Is your feature request related to a problem? Please describe. I am trying to use the learning network API and would like to store additional results in the fitresult of my composite model. Could you provide some guidance on how to do this properly?

Describe the solution you'd like Ideally I'd like to be able to store the value of any node that was computed at training time.

Describe alternatives you've considered It seems that only the submodels fitresults are natively stored, so one way to do it I guess would to define some kind of ResultModel as a submodel for whatever value I would like and compute the result in the fit! function of this model.

Additional context

I must add that the learning network I am trying to build is not regular in that it will never be used for prediction however I feel that what I'm trying to do may be of general use in MLJ.

For instance, the following works fine except that I can't retrieve the value of the final node because of the anonymization in the return!. Moreover I don't think this is appropriate as I guess all computations (except fitting) would be made again each time I call the node right?


using MLJ

LinearRegressor = @load LinearRegressor pkg=MLJLinearModels verbosity = 0


mutable struct MyModel <: MLJ.DeterministicComposite
    model
end


function MLJ.fit(m::MyModel, verbosity, X, y)
    Xs = source(X)
    ys = source(y)
    mach = machine(m.model, Xs, ys)
    ypred = MLJ.predict(mach, Xs)
    μpred = node(x->mean(x), ypred)
    σpred = node((x, μ)->mean((x.-μ).^2), ypred, μpred)
    mach = machine(Deterministic(), Xs, ys; predict=σpred)
    fitresult, cache, report = return!(mach, m, verbosity)
    mach.fitresult = (σpred=σpred, fitresult...)
    return mach.fitresult, cache, report
end

X, y = make_regression(500, 5)
mach = machine(MyModel(LinearRegressor()), X, y)
fit!(mach)
fitted_params(mach)
mach.fitresult.σpred()

olivierlabayle avatar Sep 16 '21 17:09 olivierlabayle

@olivierlabayle Thanks for raising this interesting question about creating new interface points for composite models.

Of course, if you are not interested in the ordinary predict output, you could just define the predict node to be σpred with return!(machine(Xs, ys, predict=σpred), model, verbosity). But I don't think that is what you are getting at right? You are looking to add ways of accessing information in addition to what can be extracted from predict and transform (you can already define both, incidentally).

But I am interested in clarifying exactly what you want here. I see that there are two possible objectives here. Do you want the output of σpred on the training data, to be recorded somehow in the report or fitted params or are effectively seeking to add a new operation that can be called on new data, like a predict operation? That is, are we trying to record extra data as a bi-product of training, or do we want to add extra functions that dispatch on both new data and the outcomes of training?

ablaom avatar Sep 17 '21 01:09 ablaom

@ablaom Thanks for getting back at me so quickly.

I am working in causality, which means that my scenario differs from the traditional MLJ framework in the following ways:

  • I don't have data as (X, y) but rather (X, W, y).
  • I don't really have a predict time. In MLJ the learning algorithm outputs a prediction function, while I am only interested in outputing a real number (or vector).

The reason why I am so interested in the learning network API is that I think it provides a nice caching and scheduling mechanism. For instance, again in my use case, I might want to change one hyperparameter of model3 (see below) so that the whole procedure will not refit model1 and model2 because their upstream has not changed.

To cut it short, I think using the predict node (or more reasonably defining a new operation node) might work for me (as in the following ) but I don't want the computations to happen twice. Moreover this currently doesn't work because predict expects the data to be (X, y). The other solution would be to record some state of information at fit time as you mention, it seems both more appropriate for my use case and still useful fo general MLJ users (For instance I initially wanted to report the scores of the learners in the Stack). For general MLJ users that would be in addition to the predict function and for me it would be all I require.

Hope this helps!


using MLJ

LinearRegressor = @load LinearRegressor pkg=MLJLinearModels verbosity = 0


mutable struct MyModel <: MLJ.DeterministicComposite
    model1
    model2
    model3
end


function MLJ.fit(m::MyModel, verbosity, X, W, y)
    Xs = source(X)
    Ws = source(W)
    ys = source(y)

    mach1 = machine(m.model1, Xs, ys)
    mach2 = machine(m.model2, Ws, ys)

    ypred1 = MLJ.predict(mach1, Xs)
    ypred2 = MLJ.predict(mach2, Ws)

    Y = hcat(ypred1, ypred2)

    mach3 = machine(m.model3, Y, ys)

    ypred3 = MLJ.predict(mach3, Y)

    μpred = node(x->mean(x), ypred3)
    σpred = node((x, μ)->mean((x.-μ).^2), ypred3, μpred)

    estimate = node((μ, σ2)->(μ, σ2), μpred, σpred)

    mach = machine(Deterministic(), Xs, ys; predict=estimate)

    return!(mach, m, verbosity)

end

X, y = make_regression(500, 5)
model = MyModel(LinearRegressor(), LinearRegressor(), LinearRegressor())
mach = machine(model, X, X, y)
fit!(mach)
estimate = MLJ.predict(mach)

olivierlabayle avatar Sep 17 '21 09:09 olivierlabayle

@olivierlabayle I've played around with this a bit today and will get your feedback on one experiment in the next day or so.

ablaom avatar Sep 23 '21 06:09 ablaom

@ablaom That's great, very happy to hear that, thanks a lot!

olivierlabayle avatar Sep 23 '21 17:09 olivierlabayle

@olivierlabayle Please have a look at https://github.com/JuliaAI/MLJBase.jl/pull/644 which addresses the original suggestion and give me your feedback.

I think in the immediate term causal inference with targeted learning is out-of-scope. My focus for the next few months will be moving towards version 1.0.

Perhaps you can hack around the other obstacles for now, eg by exporting a predict node that you have no intention of using.

You might also want to conceptualise your model as a transformer with a single tuple (X, W, y) as input, which you split up.

ablaom avatar Sep 24 '21 00:09 ablaom

Yes I understand and I wasn't planning on having a dedicated MLJ structure for this. As you say, I will be hacking a bit, for now it's a mode with unused predict node but I like the transformer idea. I think with this pull request I should be good to go and benefit from the learning network machinery.

olivierlabayle avatar Sep 24 '21 15:09 olivierlabayle

Wouldn't it be more intuitive/self-explanatory to add a report kwarg to the surrogate machine call that takes a named tuple input? It would also allow fitted_params if it's necessary at some point in the future.

mach = machine(Deterministic(), Xs, ys; predict=ypred3, μpred=μpred, σpred=σpred)

would become

mach = machine(Deterministic(), Xs, ys; predict=ypred3, report=(μpred=μpred, σpred=σpred))

@ablaom I also stumbled over this issue while implementing composite detectors, which should store training scores in the report for the composite model.

davnn avatar Oct 05 '21 06:10 davnn

@davnn Thanks for chiming in here.

I also thought of this, but it seemed a bit more complicated. But yes, as you say this may be "more intuitive/self-explanatory". I should be happy to make that change.

which should store training scores in the report for the composite model.

Ah, yes, I can imagine that could be so. Does this mean we need to expedite this somewhat? Currently this is low on my priorities as I am swamped with other stuff.

ablaom avatar Oct 05 '21 19:10 ablaom

Ah, yes, I can imagine that could be so. Does this mean we need to expedite this somewhat? Currently this is low on my priorities as I am swamped with other stuff.

Nope, consider it low prio as well, just using a custom return! for now.

davnn avatar Oct 06 '21 05:10 davnn

@ablaom Thank you for managing to do this feature!

olivierlabayle avatar Jan 04 '22 14:01 olivierlabayle

I'm having a difficult time converting my custom return! to the new MLJ API (added in https://github.com/JuliaAI/MLJBase.jl/pull/644). Previously, I could just use

function return_with_scores!(network_mach, model, verbosity, scores_train, X)
    fitresult, cache, report = MLJ.return!(network_mach, model, verbosity)
    report = merge(report, (scores=scores_train(X),))
    return fitresult, cache, report
end

instead of return! to add a scores field to the report named tuple. Using the same function with the new MLJ API results in a report = (..., additions = (scores = [1,2,3],...)), which means that there is no longer a unified API (between composite and individual models) to access the training scores. I would now have to check everywhere if the model is a composite and use report.additions.scores, or is there a better solution?

davnn avatar Aug 18 '22 11:08 davnn

@davnn Good point. I suggest we add a raw_training_scores accessor function as suggested in the tracking issue cross-referenced above.

What do you think?

ablaom avatar Aug 19 '22 02:08 ablaom

Thank you for your detailed thoughts on how we could go forward. I need some more time to think about it. I'm a bit afraid of feature creep in MLJ, but maybe that's not a big problem.

davnn avatar Aug 21 '22 19:08 davnn

Alternatively, we could introduce more generic accessor functions, training_predictions(model, fitresult, report) and training_transformations(model, fitresult, report) which, when implemented, are syntactically equivalent to predict(model, fitresult, Xtrain) and transform(model, fitresult, Xtrain) but more efficient, because they just extract data pre-computed at fit time (and available in fitresult or report)? Mmm, might be a bit abstract for users?

In your use case, you overload training_transformations to return training raw scores for all detectors: for regular detectors, this is report.scores (or whatever - I forget what you call them) and for composite models it's report.additions.scores.

ablaom avatar Aug 23 '22 01:08 ablaom

I would prefer to keep the API simple with a report that can flexibly accommodate predictions, transformations or whatever the algorithm could produce. Strangely enough, predict(model, fitresult, Xtrain) would NOT result in the training scores observed during fit for neighbor-based methods, because predict would compare the points in Xtrain to Xtrain, but fit ignores the first (trivial) neighbor.

It might make sense to follow the uniform access principle for things like the models' report, i.e. discourage or even disallow direct access to model intrinsics such as model.report and encourage report(model), which could be easily customized on a per-model basis to return any custom format.

davnn avatar Aug 23 '22 17:08 davnn

Thanks for these points. I have some ideas about how to do this properly (and also how to greatly simplify the learning networks "export" process) but it's going to take a little time. I will keep you posted, and I appreciate your patience.

ablaom avatar Aug 29 '22 03:08 ablaom