MLJModels.jl icon indicating copy to clipboard operation
MLJModels.jl copied to clipboard

Add PartitionedLS model to the registry

Open boborbt opened this issue 1 year ago • 6 comments

Hi, I would like to have PartitionedLS added to the registry.

A PartitionedLS model solves a constrained least squared problem where the features are partitioned by the user into groups. The solution is constrained to have all features in a group contributing in the same "direction" (i.e., positively or negatively) to the solution. Also, the solution is given in terms of two kind of variables: $\beta$ variables specify how much and in which direction each group contributes to the solution, $\alpha$ variables specify how much each feature contributes to each group.

The implementation is here. It supports MLJ interface since version 1.0.0.

The code supporting the MLJ interface can be found here.

More information about the technique can be found in this paper (currently submitted to the Machine Learning Journal after a major revision).

boborbt avatar Apr 05 '24 12:04 boborbt

Great to have this contribution, thanks.

A cursory look at the code suggests this looks good. Can you please bring the the hyperparmeter struct docstring up to spec?

@ablaom To do:

  • [x] Make a PR at PartionedLS.jl adding MLJInterface.jl tests
  • [x] Make local integration tests
  • [x] Add model to registry (after docstring is ready) https://github.com/JuliaRegistries/General/pull/104858
  • [x] Update "Model Browser" at MLJ

ablaom avatar Apr 07 '24 20:04 ablaom

Waiting for:

  • [x] https://github.com/ml-unito/PartitionedLS.jl/issues/4
  • [ ] Feedback on https://github.com/ml-unito/PartitionedLS.jl/pull/7

ablaom avatar Apr 07 '24 21:04 ablaom

Hi,

A cursory look at the code suggests this looks good. Can you please bring the the hyperparmeter struct docstring up to spec?

I've tried to follow the spec, but I was not able to use the $(MLJModelInterface.doc_header(...)) call. For some reason if I include it precompiling the package fails asserting that MLJModelInterface is not defined (regardless to importing or using it).

To overcome the problem I updated the docstring for PartLS to match the required format. I left the docstring I was trying to use at the end of the file, with the only modification that $(MLJModelInterface... is commented out. If you have any suggestion about how to fix the problem I will be glad to incorporate it.

You can find the updated files in commit bcafc93.

boborbt avatar Apr 08 '24 07:04 boborbt

Waiting for:

This has been fixed in commit 6b9a3f7.

boborbt avatar Apr 08 '24 07:04 boborbt

Thanks @boborbt for that. Would you consider updating the docstring as indicated above?

ablaom avatar Apr 08 '24 22:04 ablaom

Okay, I see this is done, many thanks! Please see my proposed update, https://github.com/ml-unito/PartitionedLS.jl/pull/7.

ablaom avatar Apr 09 '24 00:04 ablaom

Thanks @boborbt for that. Would you consider updating the docstring as indicated above?

As I mentioned above, I tried my best, but was not able to use the $(MLJModelInterface.doc_header(...)) call. It breaks compilation and I could not figure out why -- the error message says it cannot find MMI (which is defined as an alias for MLJModelInterface).

As a workaround I reported the docstring as a "normal" docstring for the struct. If you have any hint about to overcome the compilation problem I am more than willing to incorporate the fix.

Thanks.

boborbt avatar Apr 09 '24 08:04 boborbt

I've tried to follow the spec, but I was not able to use the $(MLJModelInterface.doc_header(...)) call. For some reason if I include it precompiling the package fails asserting that MLJModelInterface is not defined (regardless to importing or using it).

Sorry, I completely missed this.

I don't think we need to worry about including the MLJModelInterface.doc_header call; it's just a convenience macro. It is used without problems for dozens of other models; sorry I can't guess the issue and don't have the bandwidth to investigate just now.

Now you have made Float64 generic, I suggest we amend the docstring as follows:

- `X`: any matrix with element type `Float64`, or any table with columns of type `Float64`

is replaced with

- `X`: any matrix or table with `Continuous` element scitype. Check column scitypes of a table `X` with `schema(X)`.

- `y`: any vector with `Continuous` element scitype. Check scitype with `scitype(y)`. 

Then I think we're ready to roll.

ablaom avatar Apr 09 '24 21:04 ablaom

I just updated the documentation as suggested (see commit b1ca7ba9).

By the way, thanks for all the suggestions you gave me. It was worth registering to MLJ just for the free code review it comes with it 😉.

boborbt avatar Apr 10 '24 00:04 boborbt

Any news about this issue? I believe everything needed is in place now.

boborbt avatar Apr 12 '24 05:04 boborbt

Sorry, I posted an comment but must have forgotten to click "comment".

I'm waiting for you to tag a new release to make the doc strings live. The registration process grabs the doc strings from the latest tagged release to include in the registry metadata (which is searchable from MLJ without loading model-providing code) and appears here also.

Please ping me when this is ready.

ablaom avatar Apr 12 '24 09:04 ablaom

No problem @ablaom, I just tagged version 1.0.4, registered on Julia general registry and generated a new release on github. The tagged version is this one.

boborbt avatar Apr 12 '24 11:04 boborbt

Fixed the issue you opened yesterday.

boborbt avatar Apr 13 '24 06:04 boborbt

I would like to add support for PrecompileTools, but to do so I would need to add a workload including fitting the whole model which implies using MLJBase.machine. This makes MLJBase a dependency of the package, which was frown upon in one of your previous messages.

What do you suggest? Do I drop support of PrecompileTools? Do I add MLJBase to the dependencies? Do you see any way to add it as a non-hard depenedency?

If you need to have a look, you find the candidate code in branch precompile-tools.

boborbt avatar Apr 13 '24 08:04 boborbt

I suggest you only add a precompile load for your "native" API. If your native API depends on MLJBase to work, then we ought to be able to fix that. For example, if you use MLJModelInterface.matrix (which won't run properly without MLJBase loaded) then you can use import Tables; Tables.matrix instead. Let me know what causes you issues.

There is a known issue preventing precompilation loads for MLJBase having much impact: https://github.com/JuliaAI/MLJBase.jl/pull/900 . This has to do with the way we made MLJBase an "optional" dependency for model-providing packages before Julia introduced weak dependencies and package extensions (and recompilation workloads) to do this. And we can't "modernise" without breaking MLJModelInterface, which we really don't want to do right now.

I think this issue would carry over to your case: If you make MLJBase a hard dependency, then you won't actually realize much benefit.

I'm no expert on this issue, but this is my general impression.

ablaom avatar Apr 13 '24 21:04 ablaom

https://alan-turing-institute.github.io/MLJ.jl/dev/models/PartLS_PartitionedLS/

ablaom avatar Apr 14 '24 04:04 ablaom

Thanks @boborbt for your patience with the model adding process. It was helpful to have the prompt responses to my suggestions to get this sorted quickly. Congratulations again on the new package.

If you have not already done so, you may like to announce it here at some point.

ablaom avatar Apr 15 '24 02:04 ablaom