MLUtils.jl
MLUtils.jl copied to clipboard
Scope of this package
This package is kickstarting the plan outlined in https://github.com/JuliaML/LearnBase.jl/issues/49
- For the moment we can add both definitions and implementations here, at some point we will move the basic definition to a LearnAPI.jl package (to be created).
- We can gradually move here the functionality from MLLabelUtils and MLDataPattern. We mainly want to add the functionality that is effectively in use in the ecosystem (e.g. FastAI.jl and MLJ.jl) and leave the rest out to avoid extra maintenance complexity.
@darsnack @johnnychen94
It's already have a package called MLBase.jl https://github.com/JuliaStats/MLBase.jl
Ouch! I will rename to MLUtils then
It would be nice if you could coordinate the name of all these packages to simplify and unite the pieces. I know out of my head that the list includes MLBase.jl MLJBase.jl LearnBase.jl ML*Utils.jl ...
The name MLUtils.jl doesn't help improve this situation. Are you aiming for a base package or a utils package? These are very different AFAICT. Can we change the name to something more "final" and spread the words with the current maintainers of the other packages so that they can join efforts in a single place?
I'd like to have here functionality mostly from
- LearnBase
- MLDataPattern
- MLLabelUtils
Those packages have been into a maintenance hell for years. We could also have stuff from
- PenaltyFunctions
- LossFunctions
- ObjectiveFunctions
- MLMetrics
that all seem not well maintained but also quite lean and stable, maybe only some cruft to be removed.
I'm not sure what would be the most appropriate name, you have suggestions? We could archive LearnBase, MLDataPattern, MLLabelUtils after we have done the port.
I love the idea of joining these "util" packages into a single hub with core functionality and also support the idea of deprecating these packages afterwards. This will potentially solve the premature modularization problem in JuliaML. As for the name we could pick a name that reflects this "Base" or "Core" nature of the effort.
The packages I'm maintaining in this org are LossFunctions.jl TableTransforms.jl and TableDistances.jl. I'm going on vacation now but will back in January to try help.
On Sun, Dec 26, 2021, 07:13 Carlo Lucibello @.***> wrote:
I'd like to have here functionality mostly from
- LearnBase
- MLDataPattern
- MLLabelUtils Those packages have been into a maintenance hell for years. We could also have stuff from
- PenaltyFunctions
- LossFunctions
- ObjectiveFunctions
- MLMetrics that all seem not well maintained but also quite lean and stable, maybe only some cruft to be removed.
I'm not sure what would be the most appropriate name, you have suggestions? We could archive LearnBase, MLDataPattern, MLLabelUtils after we have done the port.
— Reply to this email directly, view it on GitHub https://github.com/JuliaML/MLUtils.jl/issues/2#issuecomment-1001145974, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZQW3MO575BPTNK6YSUT4DUS3TGNANCNFSM5KYVRNKA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you commented.Message ID: <JuliaML/MLUtils. @.***>
Is the plan to make changes here directly instead of "fixing" the existing packages? What's the timeline here? I would say we at least need a couple of cycles with the old packages using the same interface to make the transition smooth.
I would suggest MLDataUtilities as a name. Let's make it explicitly about working with data. There's no reason the release cycle/maturity of the package should be the same for data vs. losses, etc.
Are there any communication channels open between JuliaML and JuliaStats? MLBase has essentially seen 0 development since 2018, so it would be good to know if anyone still has plans for that package.
But wait, there's more! https://github.com/JuliaAI/StatisticalTraits.jl exists as well. We should try to get folks from each org on an ML call at some point to work all this out...
I would suggest MLDataUtilities as a name. Let's make it explicitly about working with data. There's no reason the release cycle/maturity of the package should be the same for data vs. losses, etc.
So the plan could be to finish the port in this repo where I'm gradually reviewing and modernizing the whole codebase, then push the whole thing into https://github.com/JuliaML/MLDataUtils.jl, tag a breaking release, archive MLDataPattern and MLLabelUtils. What do you think?
Are there any communication channels open between JuliaML and JuliaStats? MLBase has essentially seen 0 development since 2018, so it would be good to know if anyone still has plans for that package.
Not that I know. From what I've seen these kinds of coordination efforts are complicated given the general lack of interest, original authors moving away etc... We can try though
So the plan could be to finish the port in this repo where I'm gradually reviewing and modernizing the whole codebase, then push the whole thing into https://github.com/JuliaML/MLDataUtils.jl, tag a breaking release, archive MLDataPattern and MLLabelUtils. What do you think?
This sounds good to me. So this repo is mostly for testing things out away from the main repo. In fact, I can help port stuff here...it will be much easier than updating MLDataPattern.jl. In fact, I just wasted a day's worth of work updating DataSubset
only to find out something in shuffleobs
broke related to obsdim
. I would much rather spend my time working on a package that doesn't have these corner cases weaved through.
My suggestion would be then that I revert our recent changes to LearnBase.jl and drop the PRs to MLDataPattern.jl and MLLabelUtils.jl. We can then have a release of the three that restore the old status. Right now LearnBase.jl and MLDataPattern.jl are in conflict. Once this is done, I can help port stuff over here. Does this sound good?
@johnnychen94 do you have any thoughts on this plan?
If our goal here is to support Flux with JuliaML utilities, starting a new repo like this is a smoother plan than upgrading MLDataPatterns and LearnBase in a compatible way, since the latter has shown to be over-difficult to do in the past year (sorry that I didn't help with any of the PRs). It doesn't necessarily need to be here; it can be hosted in FluxML if that attracts more attention.
For myself, because I'm in the last 1.5 years of my Ph.D. career I need to focus more on my thesis, thus I don't really have much free time to contribute to Julia. I think I'll limit myself to JuliaImages development but I'll be very happy to give feedback whenever requested.
If you feel like the name MLBase is the best one we could probably transfer it from JuliaStats to JuliaML. AFAICT that package is almost dead for years.
what is the plan status now?
Functionality from LearnBase and MLDataPattern has been absorbed. MLLabelUtils and MLDataUtils are still to be imported. When done, I think the current plan is to replace in MLDataUtils the content of this repo. I would also be ok with just tagging this package (even right now)
What about the suggestion above of moving MLBase to this org and tagging a major release? Wouldn't it be a better way forward?
On Sun, Jan 30, 2022, 06:08 Carlo Lucibello @.***> wrote:
Functionality from LearnBase and MLDataPattern has been absorbed. MLLabelUtils and MLDataUtils are still to be imported. When done, I think the current plan is to replace in MLDataUtils the content of this repo. I would also be ok with just tagging this package (even right now)
— Reply to this email directly, view it on GitHub https://github.com/JuliaML/MLUtils.jl/issues/2#issuecomment-1025100894, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZQW3NIUQ4EDXDAKM64L5LUYT5YBANCNFSM5KYVRNKA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you commented.Message ID: <JuliaML/MLUtils. @.***>
Also fine, unless we think that MLBase feels too "basic" for something containing also e.g a DataLoader. Or that we want some name containing "data", and we limit the scope to data manipulation utilities (e.g. no loss functions and no metrics). I think I lean toward having only one beefy (but not too beefy) package, MLBase. I think also StatsBase is not so lean.
Yes my personal opinion is that a single package is the best option. It is super hard to advance with features when multiple packages are developed in parallel with tiny set of features. In the future if the package becomes too heavy in terms of dependencies then it makes sense to split.
On Sun, Jan 30, 2022, 07:13 Carlo Lucibello @.***> wrote:
Also fine, unless we think that MLBase feels too "basic" for something containing also e.g a DataLoader. Or that we want some name containing "data", and we limit the scope to data manipulation utilities (e.g. no loss functions and no metrics). I think I lean toward having only one beefy (but not too beefy) package, MLBase. I think also StatsBase is not so lean.
— Reply to this email directly, view it on GitHub https://github.com/JuliaML/MLUtils.jl/issues/2#issuecomment-1025110866, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZQW3IAABZQJXIPF25B3GTUYUFNLANCNFSM5KYVRNKA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you commented.Message ID: <JuliaML/MLUtils. @.***>