Distributions.jl
Distributions.jl copied to clipboard
Release Distributions.jl v1.0
I'm eager to have the guarantees of semver for packages I use as dependencies. What needs to happen before this package could be considered 1.0 and released as stable? From the outside it seems like it might already be ready for that, but I don't have a good view on any breaking changes that may still need to be made or essential functionality that's missing.
#823 is definitely a big breaking change.
@non-Jedi thanks for this issue.
- [ ] As mentioned, https://github.com/JuliaStats/StatsBase.jl/issues/493 is a big one. Once done, we have to migrate all StatsBase functions
- [ ] #823 made us realize the tests are in a bad state, we should remove JSON-based tests and replace them with generated Julia files, will be much faster
- [ ] Improved documentation for sampler and distribution interfaces. In the current state, it's very hard to know what is required and what is optional for implementing new distributions. The doc should be improved for that.
- [ ] if done at all, #844 needs to be done before freezing
- [ ] there are some design issues with mixtures
- [ ] removing dependencies on PDMats would make the lib way more lightweight cc @andreasnoack https://github.com/JuliaStats/Distributions.jl/issues/688
Given how much work that already is on the plain stuff which would make Distributions 1.0-ready, I tend to think #823 is optional there, if not present, it should be a 2.0 milestone
What's needed for GPU support?
* [](https://github.com/JuliaStats/Distributions.jl/issues/688)
Given how much work that already is on the plain stuff which would make Distributions 1.0-ready, I tend to think #823 is optional there, if not present, it should be a 2.0 milestone
It would be weird to tag 1.0 with a large PR that everybody is looking forward. At least, deprecations could be made so that the PR can be merged without breaking anything in 1.x.
OTC, refactoring tests doesn't affect the API, so I don't see what it should be on the 1.0 roadmap. Same for documentation.
@datnamer
What's needed for GPU support?
I am not sure what you mean. In doubt, I'd say generic containers everywhere instead of hardcoded Float64
Refactoring tests is a major point blocking #823. I was considering 1.0 not only from the semver point of view, but also as signal of readiness and robustness of the library. Documentation here is also documenting clearly what is required as interface for a distribution, otherwise any removal afterwards is breaking
Refactoring tests is a major point blocking #823.
Not really: we could merge #823, it's just that the keyword constructors aren't really being tested.
Not really: we could merge #823, it's just that the keyword constructors aren't really being tested.
Not testing the keyword constructor would be problematic, we would merge something we are not able to test against moving forward.
Ok for #823 before 1.0 yes, even if we don't need to see 2.0 as a super-distant milestone, unlike the Julia project itself. DifferentialEquations has been moving on with major releases
@non-Jedi if you want to pick up some issues from there feel free :) I tagged a bunch of issues as up-for-grabs
Also: @cscherrer, this one is still open and could potentially lead to breaks https://github.com/JuliaStats/Distributions.jl/issues/771
Please consider also #911, which I think could also lead to breaking changes.
It's not major, it could lead to breaking change but nothing awful, and always fixable by specifying types
Not if you add a deprecation fallback, but anyway yes this will be better done before 1.0 to be able to break it
As far as breaking changes are concerned before 1.0, can I also put in a vote for design issues with mixture models - specifically, but not only, combining discrete and continuous distributions to get discontinuous distributions - I'm gathering thoughts in #925 (and on slack). Also fixing discrete distributions having to be Ints rather than any countable set - see #916 for example - will be mildly breaking.
#916 would be breaking since it modifies the existing DiscreteNonParametric
distribution. But we could add a new distribution, separate from the existing DiscreteNonParametric
, that allows the support to be any countable set. And that wouldn't be breaking, if it were a new distribution.
But in any event, extending Discrete
to allow any countable set rather than just Int
s (which is the current spec) and Real
s rounded or floored to Int
s would be breaking (albeit in ways that hopefully no-one is using), wouldn't it? I know you're only doing part of that at the moment, you're just doing non-Number
s, but it seems a logical thing to do too, and any code that depends on discrete distributions being integer-based will break anyway...
@datnamer
What's needed for GPU support?
I am not sure what you mean. In doubt, I'd say generic containers everywhere instead of hardcoded Float64
Also it seems that a common use case in deep learning is not effeciently handled by the current design. Say we have a vector of Isotropic Gaussian vg = [MvNormal(mu1, sig1), MvNormal(mu2, sig2), ...]
and I'd like to draw one sample from each Gaussian rand.(vg)
. This operation is much faster if all mu
s and sig
are concatenated into matrix and the sampling needs only be done once via matix operations, which is fast in GPU. I think this is a general pattern in deep learning and could be accelerated by GPUs. Any idea if this kind of "GPU batch operation" would be somehow supported?
@xukai92 would this imply a large sparse covariance matrix of the stacked random variables?
Yes I believe this is one way to say it.
My vote would be to just release 1.0.0 as is, so that we get the full richness of semver going forward, it just makes the life of downstream packages so much easier.
I wouldn't mind at all if there is a breaking 2.0.0 release soon after, and if there is a need for 3.0.0, well, then that can come as well. I would not treat 1.0.0 as anything special, other than unlocking more features from the package manager.
I opened a new issue for GPU-related support ( #1067 ) to coalesce the steps it would take to get the package there. Happy to start contributing towards this goal once the community discusses a bit.
Where are we on this?
If we tagged v1.0, I think the only things likely to make us release v2.0 shortly after are https://github.com/JuliaStats/Distributions.jl/pull/823 and https://github.com/JuliaStats/Distributions.jl/pull/951
Those are both large PRs with very little progress on them in the last 6 months (which is fair enough -- they'll be nice when we do get them over the line). I work on a couple of (private) packages that have compat entries like Distributions = "0.16, 0.17, 0.18, 0.19, 0.20, 0.21, 0.22"
, so I think we're in a good place regarding stability.
@matbesancon what do you thinking about tagging v1? What would be your main concerns?
@nickrobinson251 thanks for keeping up on this. I tend to agree with your point, the two PR mentioned have been stalled and I'd be fine with keeping them for a 2.0. I think one last breaking PR is https://github.com/JuliaStats/Distributions.jl/pull/1079 I would be for releasing a 0.23 with this last PR incorporated, along with any other imminently mergeable, and then 1.0 with all the deprecations removed
What do you think? Also my timetable is a bit tight these days, and I feel like it's the case for many other maintainers, I'd appreciate any help on checking what has to be done before 1.0 :)
I stopped working on #951 after several months a while ago because it wasn't clear it was going to be merged to be honest (and I had too much teaching at the time). I think it's the right thing to do - ValueSupport
needs to be extended to handle more distributions, and there are several PRs that address that kind of issue in specific cases, but I spent a lot of time on it but it felt (to me) like it wasn't going to be merged, it was just going to drift on taking time for ever. I'd be very happy if I was wrong. If the repository owners are still interested and will merge it I can work it up again? Even for 1.0 if it's desirable as it is lightly breaking...
Regarding 1.0. Is there a decision on what to do about issues like https://github.com/JuliaStats/Distributions.jl/issues/1071? In general, how to allow for different return types of random numbers generated from a distribution?
This was one of the things solved by #951 - it imputed desired return types from parameter types. If that's the right thing to do of course!
Bump. Lack of a 1.0 release is causing real pains for people: https://github.com/JuliaRegistries/General/pull/33273.
It might be worth making two separate lists of:
- What breaking changes are actually blocking a 1.0 release?
- Breaking changes that could wait until a 2.0 or 3.0 release.
Remember: integers are free. There's not harm in releasing 1.0 now and releasing 2.0 whenever it's ready, be it next year or whatever. Given how long we've been waiting for 1.0, seems reasonable to make a 1.0 release sooner than later.
I agree it's probably a good idea to tag 1.0. But often the problem is that no breaking changes have been merged, so that moving to 1.0 would require all dependencies to bump their version bounds even though nothing will break. That's the 1.0 paradox. :-)