MLJ.jl Reinistate integration tests for SymbolicRegression?

The issue https://github.com/MilesCranmer/SymbolicRegression.jl/issues/390 is now resolved. However, the models are extremely slow to train, relatively to others; integration tests on tiny data sets take some minutes (> 10min) apparently due to inappropriate default hyper-parameters for small data.

I propose removing SymbolicRegression altogether from the tests, as not really needed for testing integration. We have plenty of other models of that generic type.

@MilesCranmer

Jan 05 '25 20:01 ablaom

If just needing a quick test, you could set .niterations=1? The hyperparameter defaults are much beefier.

Jan 05 '25 22:01 MilesCranmer

Thanks for the suggestion, but integration tests only test default values.

Jan 08 '25 00:01 ablaom

I’m not sure I understand the issue here. Do you mean that the defaults need to be under a certain compute budget as part of the formal interface? But different algorithms have different costs, by their very nature. Maybe there could be a MLJModelInterface.test_defaults(::Regressor) trait to recommend defaults for unit test purposes?

Jan 08 '25 01:01 MilesCranmer

Hi @ablaom,

I have created SymbolicRegression.SRTestRegressor and SymbolicRegression.MultitargetSRTestRegressor which have lightweight defaults. Let me know if that helps.

Best, Miles

Feb 03 '25 00:02 MilesCranmer

Thanks for this. This might potentially help, but, as mentioned earlier, I think we have sufficient models for testing MLJ integration. If you want, you could add the integration tests yourself locally, as I think I sketched out earlier.

@Moef At present integration tests don't catch warnings like you encountered before, although I suppose this could be added.

The way the registration works is that all models in the package get added. (That is, you register a package, not individual models.) So these new models will be discoverable to the general MLJ user. I can mask out the integration tests for the "normal" models, as present, but to "hide" the test models from the user would require a hack, which I am reluctant to add. So, only add these if you are happy for the general user to see them.

Incidentally, if you add them, integration tests will be automatically included in the next MLJModels/MLJ update cycle.

Feb 11 '25 23:02 ablaom

The [compat] bounds for SymbolicRegression now appear so out of date that a very old version is being loaded in integration tests, and the metadata of that old version is clashing with the metadata at the model registry. This is causing problems with integration tests, so I remove SymbolicRegression completely from the MLJ Project [extras] section (making the package available for the integration tests). So, when SymbolicRegression is finally up-to-date again we need to:

[ ] Add SymbolicRegression to [extras]

if we ever want to re-instate its integration testing.

Jul 13 '25 06:07 ablaom

Is this the [compat] here or in SR.jl?

Jul 13 '25 07:07 MilesCranmer

I would recommend having separate Project.toml for each integration test, rather than a single combined one. Check out Mooncake.jl for an example of this.

Jul 13 '25 07:07 MilesCranmer

We may need to do something like that. Another possibility is to require each packages to run their own integration tests. The original idea was that a user could load all model-providing packages simultaneously into their MLJ workflow and expect the latest versions to be loaded without issues. Otherwise, behaviour of models is dependent on the particular combination loaded, which can make reproducibility difficult. Somewhat surprisingly, this has generally worked well, but it does add a maintenance burden which is becoming more challenging to stay on top of.

Jul 13 '25 08:07 ablaom

Also, we want the actual model traits to reflect those saved in the model registry, which is created by using a Project.toml with all the packages in it.

Jul 13 '25 08:07 ablaom