Meta issue: lssues for possible collaboration with UCL
-
Disintegration of MLJModels (medium) ~~https://github.com/alan-turing-institute/MLJModels.jl/issues/244 : priority would be for GLM, with a blank repo at https://github.com/alan-turing-institute/MLJGLMInterface.jl ; you could use https://github.com/alan-turing-institute/MLJNaiveBayesInterface.jl as a template.~~ done
-
Universal transformer for wrapping univariate transformers (medium) https://github.com/alan-turing-institute/MLJModels.jl/issues/288 : more detailed design proposal needed. Familiarity with the logic of existing
Standardizerhelpful. This may already be a good template for what we want to do here (just replaceUnivariateStandardizerby a user-specified one). Need to worry aboutinverse_transformwhen implemented. -
Disintegration of MLJBase (medium) parts of https://github.com/alan-turing-institute/MLJBase.jl/issues/416 , in particular Serialization and OpenML, which seem to be hefty. Worth exploring which dependencies are causing most latency. Also, StatisticalMeasures (medium-long). Added note: Measures currently depend on UnivariateFinite, which in turn depends on Distributions, but only the base API. See this issue: https://github.com/alan-turing-institute/MLJBase.jl/issues/504
-
[x] pdfnorm for Distributions.jl (??) https://github.com/JuliaStats/Distributions.jl/issues/806: this is one I believe Mose discussed with @fkiraly but was not completed, in an earlier engagement. Would be good to know what the status of that work is.
-
[x] investigate source of package compiler issues (medium) (https://github.com/alan-turing-institute/MLJBase.jl/issues/427). Suggest commenting out src/composition/ for a start.
-
[ ]
-
Review/Redesign of model registry (long) https://github.com/alan-turing-institute/MLJModels.jl/issues/321
-
~~Test new API proposal to improve data resampling performance (medium) https://github.com/alan-turing-institute/MLJBase.jl/issues/309#issuecomment-633733155~~ done
-
[x] Add visualisation to model tuning results (medium) https://github.com/alan-turing-institute/MLJTuning.jl/issues/41
-
Populate model metadata with good default hyperparameter ranges (short-medium) https://github.com/alan-turing-institute/MLJModels.jl/issues/322
-
Allow use of sample and class weights in sk-learn models (medium) https://github.com/alan-turing-institute/MLJScikitLearnInterface.jl/issues/17 (and the related https://github.com/alan-turing-institute/MLJModels.jl/issues/127)
-
[ ] Add control over logging level (short) https://github.com/alan-turing-institute/MLJ.jl/issues/255
added mid November
- [ ] ~~cleanup of measures (short) https://github.com/alan-turing-institute/MLJBase.jl/issues/450~~ done
added early January 2020
-
[ ] roll out data front-ends for models (medium) Implement the optional data front-end that models will be able to implement after https://github.com/alan-turing-institute/MLJBase.jl/pull/501 .
-
[x] TLC for DataScienceTutorials (short - medium) The tutorials need updating to latest version of MLJ, and some contributors have made PR's that are languishing.
cc @giordano @vollmersj
this is one I believe Mose discussed with @fkiraly but was not completed, in an earlier engagement. Would be good to know what the status of that work is.
None of that happened ... well, in Julia. It exists in R now, with a number of other useful distribution methods: https://github.com/alan-turing-institute/distr6 https://github.com/alan-turing-institute/distr6/issues/196 @RaphaelS1 and @aintoha know all about it.
Ultimately, and perhaps not too surprisingly, we also ended up at a point where double dispatch would be great (function space cross-products of distribution defining functions), but R6 doesn't have an easy way for double dispatch. R7 perhaps...
On a side note, @aintoha also calculated a larger batch of integrals that might be useful to re-use instead of re-deriving them.
It exists in R now, with a number of other useful distribution methods: https://github.com/alan-turing-institute/distr6 alan-turing-institute/distr6#196 @RaphaelS1 and @aintoha know all about it.
Really good to know, thanks!!
cc @giordano
R7 perhaps...
Ha ha.
@giordano I have reviewed the checklist today, 11 March 2021. The items with a checkbox still look good, more-or-less in the order given. In particular, to start with a review of latency in MLJBase, and to move measures out. Let's talk details in a call.
Some miscellaneous "smaller" issues:
https://github.com/alan-turing-institute/MLJBase.jl/issues/573
cc @giordano
Closing as stale