MLJ.jl icon indicating copy to clipboard operation
MLJ.jl copied to clipboard

Meta issue: lssues for possible collaboration with UCL

Open ablaom opened this issue 5 years ago • 5 comments

  • Disintegration of MLJModels (medium) ~~https://github.com/alan-turing-institute/MLJModels.jl/issues/244 : priority would be for GLM, with a blank repo at https://github.com/alan-turing-institute/MLJGLMInterface.jl ; you could use https://github.com/alan-turing-institute/MLJNaiveBayesInterface.jl as a template.~~ done

  • Universal transformer for wrapping univariate transformers (medium) https://github.com/alan-turing-institute/MLJModels.jl/issues/288 : more detailed design proposal needed. Familiarity with the logic of existing Standardizer helpful. This may already be a good template for what we want to do here (just replace UnivariateStandardizer by a user-specified one). Need to worry about inverse_transform when implemented.

  • Disintegration of MLJBase (medium) parts of https://github.com/alan-turing-institute/MLJBase.jl/issues/416 , in particular Serialization and OpenML, which seem to be hefty. Worth exploring which dependencies are causing most latency. Also, StatisticalMeasures (medium-long). Added note: Measures currently depend on UnivariateFinite, which in turn depends on Distributions, but only the base API. See this issue: https://github.com/alan-turing-institute/MLJBase.jl/issues/504

  • [x] pdfnorm for Distributions.jl (??) https://github.com/JuliaStats/Distributions.jl/issues/806: this is one I believe Mose discussed with @fkiraly but was not completed, in an earlier engagement. Would be good to know what the status of that work is.

  • [x] investigate source of package compiler issues (medium) (https://github.com/alan-turing-institute/MLJBase.jl/issues/427). Suggest commenting out src/composition/ for a start.

  • [ ]

  • Review/Redesign of model registry (long) https://github.com/alan-turing-institute/MLJModels.jl/issues/321

  • ~~Test new API proposal to improve data resampling performance (medium) https://github.com/alan-turing-institute/MLJBase.jl/issues/309#issuecomment-633733155~~ done

  • [x] Add visualisation to model tuning results (medium) https://github.com/alan-turing-institute/MLJTuning.jl/issues/41

  • Populate model metadata with good default hyperparameter ranges (short-medium) https://github.com/alan-turing-institute/MLJModels.jl/issues/322

  • Allow use of sample and class weights in sk-learn models (medium) https://github.com/alan-turing-institute/MLJScikitLearnInterface.jl/issues/17 (and the related https://github.com/alan-turing-institute/MLJModels.jl/issues/127)

  • [ ] Add control over logging level (short) https://github.com/alan-turing-institute/MLJ.jl/issues/255

added mid November

  • [ ] ~~cleanup of measures (short) https://github.com/alan-turing-institute/MLJBase.jl/issues/450~~ done

added early January 2020

  • [ ] roll out data front-ends for models (medium) Implement the optional data front-end that models will be able to implement after https://github.com/alan-turing-institute/MLJBase.jl/pull/501 .

  • [x] TLC for DataScienceTutorials (short - medium) The tutorials need updating to latest version of MLJ, and some contributors have made PR's that are languishing.

ablaom avatar Oct 14 '20 06:10 ablaom

cc @giordano @vollmersj

ablaom avatar Oct 14 '20 19:10 ablaom

this is one I believe Mose discussed with @fkiraly but was not completed, in an earlier engagement. Would be good to know what the status of that work is.

None of that happened ... well, in Julia. It exists in R now, with a number of other useful distribution methods: https://github.com/alan-turing-institute/distr6 https://github.com/alan-turing-institute/distr6/issues/196 @RaphaelS1 and @aintoha know all about it.

Ultimately, and perhaps not too surprisingly, we also ended up at a point where double dispatch would be great (function space cross-products of distribution defining functions), but R6 doesn't have an easy way for double dispatch. R7 perhaps...

On a side note, @aintoha also calculated a larger batch of integrals that might be useful to re-use instead of re-deriving them.

fkiraly avatar Oct 19 '20 19:10 fkiraly

It exists in R now, with a number of other useful distribution methods: https://github.com/alan-turing-institute/distr6 alan-turing-institute/distr6#196 @RaphaelS1 and @aintoha know all about it.

Really good to know, thanks!!

cc @giordano

R7 perhaps...

Ha ha.

ablaom avatar Oct 19 '20 21:10 ablaom

@giordano I have reviewed the checklist today, 11 March 2021. The items with a checkbox still look good, more-or-less in the order given. In particular, to start with a review of latency in MLJBase, and to move measures out. Let's talk details in a call.

ablaom avatar Mar 10 '21 22:03 ablaom

Some miscellaneous "smaller" issues:

https://github.com/alan-turing-institute/MLJBase.jl/issues/573

cc @giordano

ablaom avatar Jun 06 '21 21:06 ablaom

Closing as stale

ablaom avatar Feb 21 '24 23:02 ablaom