DomainSets.jl
DomainSets.jl copied to clipboard
Getting the domain/support of objects, and domain type hierarchy
We currently don't have a common function to query the domain or support of something (a distribution, measure, ML model, etc.) yet, right? We could add something like getdomain(something)
and getsupport(some_function)
For functions, the situation is more tricky since their domain/support will often not only depend on the argument type, but also on the actual shape of the argument (array sizes, etc.). We could do getdomain(f, shp::ValueShapes.AbstractValueShape)
for the kind of stuff ValueShapes currently covers (could extend that).
Distributions currently offers support
, but it's limited to univariate distributions. Having getdomain
/getsupport
in DomainSets would allow packages to add support for complex types of domains in a clean fashion without depending on DomainSets.
Would a getdomain
/getsupport
API be welcome in DomainSets (I could do a PR)?
(Edit: Also some discussion about domain/set type hierarchies further down.)
For functions, the situation is more tricky since their domain/support will often not only depend on the argument type, but also on the actual shape of the argument (array sizes, etc.)
Can you give a concrete example?
The ones I can think of (eg matrix functions with sqrt(A)
) have a very complicated domain and so one would need a special type to represent the domain that could also have variable dimensions cooked in, eg for domain(sqrt, ::Symmetric{Float64,Matrix{Float64}})
would return something like SymmetricSemipositiveDefiniteDomain{Float64}()
.
Distributions currently offers
support
, but it's limited to univariate distributions. Havinggetdomain
/getsupport
in DomainSets would allow packages to add support for complex types of domains in a clean fashion without depending on DomainSets.
Hmm, I don't quite get one you could extend a function defined in DomainSets
without depending on DomainSets
(since you say "without depending on"). I'm all for the idea, but how do you see this?
@daanhb : Hmm, I don't quite get one you could extend a function defined in DomainSets without depending on DomainSets (since you say "without depending on"). I'm all for the idea, but how do you see this?
Sorry, I should have said "without a hard dependency in it". I meant via package extensions (falling back to a direct dependency or Requires on Julia <v1.9).
For functions, the situation is more tricky [...] @dlfivefifty : Can you give a concrete example?
I was thinking about functions like sum
, normalize
, var
and so on, functions that can take lots of different inputs, and are often extended by other packages to even more inputs. And then there's functions like identity
that will accept anything ...
For functions like log
, exp
, relu
, etc. it would be very easy to define getdomain(f)
and getsupport(f)
, of course. Maybe limiting ourselves to such cases would be enough for now? Would be helpful to add domains like ℝ
and ℝ₊
for that.
I think you need to include the type in domain
and support
(Btw I do not like the use of "get" as its not really used in Julia, apart from getindex
)
For example, the domain of sqrt
depends on whether its real or complex. Eg I'd imagine something like the following (@daanhb why is ℂ
limited to ComplexF64
?):
domain(::typeof(sqrt), ::T) where T<: Real = HalfLine{T}()
domain(::typeof(sqrt), ::T) where T<: Complex = ℂ / NegativeOpenHalfLine{T}()
Btw I do not like the use of "get" as its not really used in Julia, apart from getindex
How about domainof(obj)
and supportof(obj)
then?
I think you need to include the type [...] For example, the domain of sqrt depends on whether its real or complex.
I had that in my initial draft of this proposal, but then I thought that maybe giving the tpe is too much and too little - for distributions and measures, we don't need it, they have a well-defined and unique domain/support. For sqrt
adding the type would help, but for functions that take values that include arrays one would need both type and size (ValueShapes offers a solution here, but it' s currently limited in what it can handle).
So we could, initially, limit ourselves to cases that are clear-cut? Or we could do domainof(obj, T)
and allow, e.g. for distributions and measures, something like domainof(obj) = domainof(obj, testvalue(obj))
?
The analogue of "domain" for arrays is just called axes
. What's wrong with just using domain
?
What's wrong with just using
domain
?
I'm not against just domain
. But I think we should also have something to get support, and Distributions already "owns" support
. Also, I could see people wanting to use domain
as a variable name in code that deals with domains, so maybe domainof
wouldn't be a bad choice?
I think let users use namespaces (DomainSets.domain etc) if they need to avoid confusion. Users also use size, axes, length etc. as names for variables so this isn’t a good argument
I think we could move Distributions.support and domain to a simple package (DomainsCore or something) that just defines these two functions.
Ownership of functions is certainly an issue here. Is there any consensus about using package extensions rather than packages like DomainSetsCore or JuliaApproximationCore? I'm starting to think we could meaningfully do with both (and that support
or whatever it's called eventually should go into the latter and not the former.)
I think we could move Distributions.support and domain to a simple package (DomainsCore or something) that just defines these two functions.
In general I like that approach, having lightweight API-only packages, I think it enables us to take full advantage of multiple dispatch across packages. In this specific case though, I'm not sure - Distributions would then actually have to have a hard dependency on DomainSets - it couldn't implement domain(d)
via a Pkg extension, since that would lead to unpredictable behaviour: domain(d)
could only return a result if DomainSets is loaded, but users could request domain(d)
without loading it.
I guess Distributions wouldn't accept DomainSets as a dependency, because it pulls in StaticArrays, which Distributions has avoided so far.
I think let users use namespaces (DomainSets.domain etc) if they need to avoid confusion.
I agree in general, but with major packages like Distributions I think one should try to avoid name clashes.
are you worried about type piracy? It’s a pretty standard idiom for packages to implement functionality defined in a Core package. So donain(d) would just error and say “load DomainSets.jl” if it’s not loaded
are you worried about type piracy? It’s a pretty standard idiom for packages to implement functionality defined in a Core package.
Yes, that wouldn't be a problem, since there would be a clear contract between "DomainSetsCore" and DomainSets.
So donain(d) would just error and say “load DomainSets.jl” if it’s not loaded
But it wouldn't always, from the user point of view: not if DomainSets is loaded implicitly via some other dependency path. And if DomainSets should vanish from that (possibly deep) dependency path, then the code that worked before would break, even if upper version bounds are used correctly. I know we have this in some corners of the ecosystem, but I don't think it's a good pattern. We do require code to state all of it's dependencies explicitly in Julia (unlike Python), and I think that has been very beneficial to the stability of the ecosystem.
How about this (if the Distributions maintainers are Ok with it):
-
We add
domain
andsupport
here, DomainSets owns them. -
Distributions adds support via a Pkg extension (same for MeasureBase and possibly other packages). Distributions deprecates it's own
support
immediately, pointing users toDomainSets.support
. Distributions.support returns the same as before for the distributions where it works now (basically just univariate dists), theDomainSets.support
will return the same for these. -
At the next breaking release of Distributions (may be a quite a while), Distributions removes it's own
support
function.
CC @devmotion, I think we should ask you for some input here as well.
It seems Distributions.jl has its own interval type called RealInterval
. That could be implemented using just IntervalSets, which is very lightweight.
Somewhere in the near future possibly both IntervalSets and DomainSets will depend on DomainSetsCore, which actually defines a function called domain
for now.
Somewhere in the near future possibly both IntervalSets and DomainSets will depend on DomainSetsCore, which actually defines a function called domain for now.
Thanks @daanhb , that's great! Could DomainSetsCore also provide a function support
or similar?
This got me thinking, though - just a proposal, I don't mean to be presumptuous: if we change a few things around anyway, should we maybe start dropping the term "domain" when it comes to the type of the sets themselves, in the future? And use terms like "domain" and "support" only in the name of functions that get such sets for objects/functions?
At the moment, we have RealLine <: FixedInterval <: TypedEndpointsInterval <: AbstractInterval <: Domain
. But mathematically, the set of real numbers by itself is just a set. It is the domain of some functions, and the support of some functions, and it's the co-domain of some functions. But the real numbers, intervals, etc. are not domains intrinsically, by themselves. It's not what they are, it's a role they play in certain contexts. Maybe we could rename Domain
to something like InfiniteSet
at the top of the type hierarchy? It could possibly even be a subtype of Base.AbstractSet
- after all, in some cases we might want to use a finite Base.Set
for domains, supports and co-domains.
I like the name InfiniteSet
. The problem with InfiniteSet <: AbstractSet
is that types in Julia are more about conforming to an interface than mathematical definitions. Will an InfiniteSet
actually conform to the rules governing AbstractSet
? And these rules are not really written down anywhere...
Note I added support to Infinite arrays in Julia see
https://github.com/JuliaLang/julia/blob/master/test/testhelpers/InfiniteArrays.jl
So I'm all for allowing infinite sets but we just need to be more careful and possibly add a testhelper
to ensure the expected functionality is maintained.
Hmm, I did not see a discussion about names coming up here. Points, to name just one example, are not infinite sets. And if I may short-circuit the discussion, inheritance from AbstractSet
seems to bring mostly complications without tangible benefits. (The "is a" relationship between objects is not a good enough reason for inheritance, in my experience so far with Julia. I agree with @dlfivefifty that similar behaviour, i.e. adhering to an interface, matters more for generic programming.)
The word domain is used in many meanings, only one of which is "the domain of a function". The word is in itself also associated with sets. Some take it to mean connected open subsets, though even then people also talk about closed domains. In other contexts one might solve a PDE "on a domain", in which case it could be nearly anything. In complex analysis there are regions (e.g. there is a ComplexRegions package). And so on. There is no single good word.
Back to the issue at hand: I think that while DomainSets could be used to describe the domain or the support of a function (which to me is a subset of its domain on which it is nonzero), it should not define the concept as the package itself is not about functions. It is a coincidence that a function with the same name "domain" is defined in DomainSetsCore. I see no conceptual inconsistency in extending the meaning of that function to mean "the domain of a function" when applied to function objects. A function like "support" on the other hand might live in JuliaApproximationCore :-)
How about this (if the Distributions maintainers are Ok with it):
* We add `domain` and `support` here, DomainSets owns them. * Distributions adds support via a Pkg extension (same for MeasureBase and possibly other packages). Distributions deprecates it's own `support` immediately, pointing users to `DomainSets.support`. Distributions.support returns the same as before for the distributions where it works now (basically just univariate dists), the `DomainSets.support` will return the same for these. * At the next breaking release of Distributions (may be a quite a while), Distributions removes it's own `support` function.
My feeling is that support
(or domain
or whatever its name is) is such a fundamental part of Distributions that it shouldn't be made optional and shouldn't depend on what packages users load (im- or explicitly).
Thanks for the clarifications @daanhb !
Hmm, I did not see a discussion about names coming up here.
No, sorry - maybe I should have opened a separate issue. In any case, I certainly didn't mean to seem pushy here.
I was just thinking about how we can express things like domains and supports better in packages like MeasureBase/MeasureTheory (they currently use some custom types). And about possible proposals in regard to Distributions to get more consistency in this area. IntervalSets + DomainSets seemed like a good fit.
In any case, from there on I just thinking about "is a" vs. "role in relationship to" in regard to the terms "domain" vs. "set" in the type hierarchy. And I saw that the Readme begins with "DomainSets.jl is a package designed to represent simple infinite sets". So I thought maybe the term "domain" in the type hierarchy had historical reasons.
I agree with @dlfivefifty that similar behaviour, i.e. adhering to an interface, matters more for generic programming
Sure, generic programming in Julia doesn't strictly need supertypes. Taking full advantage of multiple dispatch, though, in my experience, gets much easier with common supertypes (where would we be without AbstractArray
?) - especially in larger (systems of) packages/applications. Just my personal view, I know this is sometimes a contentious topic. Certainly subtyping Base classes can also get tricky and result in complex dispatch decisions - I had hoped AbstractSet
would be "benign" in this respect (there are some generic methods in Base
that seem to assume that an AbstractSet
is finite, but maybe not too many to handle?). But I agree, it's something that has to be weighed carefully.
Again, I certainly didn't mean to push too far here.
A function like "support" on the other hand might live in JuliaApproximationCore
In other contexts (InverseFunction) we've been talking about the need for more central APIs to query function properties, like maybe a "FunctionTraits" package, for functions like ismonotonic
, islinear
, iscontinuous
and so on. Maybe such a package could be a good place for function to query domain and support as well? In contrast to questions about linearity, which have a yes/no answer, we should have a community consensus on what types should be returned here, though. Do you think it's within the scope of DomainSets to play such a role (where applicable)?
@devmotion : My feeling is that support (or domain or whatever its name is) is such a fundamental part of Distributions that it shouldn't be made optional and shouldn't depend on what packages users load (im- or explicitly).
Thanks for the feedback @devmotion ! Question: Are there currently any plans to extend Distributions.support
to multivariate distributions (and how to represent the result)? And would you feel comfortable with support
being hosted in very lightweight central API-def package?
@dlfivefifty: Will an InfiniteSet actually conform to the rules governing AbstractSet? And these rules are not really written down anywhere.
Good point. We could suggest defining that more closely in the language docs? :-) So far I haven't seen anything to "exotic" with AbstractSet
"in the field", so maybe it's not too late ... ;-)
Note I added support to Infinite arrays in Julia see
❤️
In other contexts (InverseFunction) we've been talking about the need for more central APIs to query function properties, like maybe a "FunctionTraits" package, for functions like
ismonotonic
,islinear
,iscontinuous
and so on. Maybe such a package could be a good place for function to query domain and support as well? In contrast to questions about linearity, which have a yes/no answer, we should have a community consensus on what types should be returned here, though. Do you think it's within the scope of DomainSets to play such a role (where applicable)?
It would be great to see some common ground here! Apart from interfaces there is a lot of duplicated effort, including in DomainSets, especially for affine maps. I was hoping to be able to move everything related to functions and maps into a separate package (#92), or to use existing packages.
Sure, generic programming in Julia doesn't strictly need supertypes. Taking full advantage of multiple dispatch, though, in my experience, gets much easier with common supertypes (where would we be without
AbstractArray
?) - especially in larger (systems of) packages/applications. Just my personal view, I know this is sometimes a contentious topic. Certainly subtyping Base classes can also get tricky and result in complex dispatch decisions - I had hopedAbstractSet
would be "benign" in this respect (there are some generic methods inBase
that seem to assume that anAbstractSet
is finite, but maybe not too many to handle?). But I agree, it's something that has to be weighed carefully.
Sure, dispatching on a common supertype is beneficial, as is being able to reuse code written for abstract types. I just think both of these advantages are limited in the context of domains, and we've actively been trying to get away from having a common type. (It would be a funny way to settle the eltype
debate.) Yet it's good to look at the methods in Base for sets, thanks - it seems we've been missing out on opportunities to be more consistent (like using "issetequal", which seems spot on).
It would be great to see some common ground here! Apart from interfaces there is a lot of duplicated effort, including in DomainSets, especially for affine maps.
I'm so with you, here! I wish we have a AbstractLinearOperator <: AbstractMultiplicativeOperator
, that people would agree to use, in a central place. Then we wouldn't have to use untyped arguments in algorithms (like solvers) that accept "anything that does a linear mul" all the time (resulting in later failure if users pass the wrong thing with less clear error messages, less possible specialization, and so on).
context of domains, and we've actively been trying to get away from having a common type
Do we really have to, though? Like @dlfivefifty says, we even have infinite arrays now - and AbstractArray
make a lot of assumptions, and has a lot of generic default implementations. Maybe it would be worth a try to see if we can stretch AbstractSet
to infinite sets, and other special sets, without getting into (too much) dispatch trouble? (Or have you guys been down that road before and it didn't work out, maybe?)
I'm just thinking about how far AbstractArray
has carried us - we have arrays that you can only read (like mapped arrays), arrays without an efficient single-element getindex (like GPU arrays), arrays without a fixed or even finite size, and so on - but the fact that they're all AbstractArray
s has enabled an incredible amount of generic code. If we had independent type hierarchies, it would be a mess of Union
s, even more package extensions, and so on - just my take. That why I wonder if we shouldn't give AbstractSet
a chance here, so to speak.
There are other cases where we don't have a common supertype, for good reason. Common supertypes for iterable objects for example - and we shouldn't have one, iterability isn't always the primary property of an object. So iterate
is "just" a (very successful) API without a type hierarchy.
So I fully agree: Just because objects implement a common API that doesn't automatically make them candidates for a common type hierarchy. We don't have multiple inheritance or similar in the language and should choose super-types very carefully. But, to take full advantage of multiple dispatch we should build common type hierarchies where they also make sense semantically, and where we don't run into "but it's just a much an A as a B" situations.
In this case I would argue that the primary property of a set, and the domains and things we talk about here, is their, well, "set-likeness". :-) That's why, IMHO, building down from AbstractSet
would make sense not just technically, but also semantically.
The issue is that looking at https://github.com/JuliaLang/julia/blob/master/base/abstractset.jl there's basically no code that will still work for infinite sets 😅
Here's an experiment to detect the interface:
julia> struct MySet <: AbstractSet{Int} end
julia> Set(MySet())
ERROR: MethodError: no method matching length(::MySet)
julia> Base.length(::MySet) = 1
julia> Set(MySet())
ERROR: MethodError: no method matching iterate(::MySet)
julia> Base.iterate(::MySet, k...) = iterate((1,), k...)
julia> Set(MySet())
Set{Int64} with 1 element:
1
So the interface for AbstractSet
is:
-
length
-
iterate
-
in
(probably.... not needed in the example above) -
eltype
(this is automatic)
So the issue for domains is that we lose (2) and potentially (4) no longer makes sense. We could make a major PR into Julia to add overloading say intersect
or union
when length
is infinite... I think that will be more difficult than the minimal changes I did for InfiniteArrays.jl (and there were complaints about it being merged when some Julia developers realised I had added Base.oneto
without them knowing!)
We could take the ArrayLayouts.jl approach and have a parallel implementation of everything. I'm not sure I'd recommend it either.
@dlfivefifty: there's basically no code that will still work for infinite sets 😅
Sure, but abstract types don't necessarily need to provide generic implementations/methods for the functions associated with them, right? What there should be, is contracts/semantics of the interface (and yes, we should write those down explicitly way more often :-) ).
If you build an InfiniteArray
like the MySet
in the example above, pretty much none of the default methods will work either. ;-)
And sure, infinite sets won't support length
and iterate
, but neither do infinite arrays (at least not iteration in finite time), and usually length and iteration are assumed be quite fundamental for arrays, I'd say. :-) But what all arrays that we have do support is random access: getindex
(of single elements or ranges of elements). It's what makes them fundamentally different from sequences.
I would say it's very similar for sets: Finite sets have a length and support iteration, infinite sets don't. But what they all share, what makes them sets, is the ability to check if they contain an element in a (somewhat) efficient manner: Base.in
. I'd say it is for sets what getindex
is for arrays. Arrays have a (somewhat) efficient getindex
, sets have a (somewhat) efficient in
. At least that how I would define it.
And of course we'll want to support operations like union
- where possible. We won't be able to implement that for all possible combinations of sets in a sensible way, but then we can't implement vcat
for all possible combinations of arrays very sensibly either.
(I should actually be pretty easy to add a generic union
implementation for AbstractSet
s to Base, just using in
, and retrofit it via Compat. Generic unions could even be made iterable if their constituents are. Generic intersects would also be easy, no way to get default iteration there, though.)
(We could even define iteration for certain infinite sets, for hypercubes for example, we could use quasirandom sequences like Sobol to implement infinite iteration. That could actually be useful in practice.)
Do we really have to, though? Like @dlfivefifty says, we even have infinite arrays now - and
AbstractArray
make a lot of assumptions, and has a lot of generic default implementations. Maybe it would be worth a try to see if we can stretchAbstractSet
to infinite sets, and other special sets, without getting into (too much) dispatch trouble? (Or have you guys been down that road before and it didn't work out, maybe?)
You raise good points but one can argue the other way here. It's great that the community can agree on using AbstractArray
as a common supertype, but it has side-effects. Plenty of objects can be thought of as an array in a sense, yet have another more logical supertype. Images come to mind (this goes way back). Not inheriting from AbstractArray
makes it hard to use the nice indexing logic that is in Base.
In this case I would argue that the primary property of a set, and the domains and things we talk about here, is their, well, "set-likeness". :-) That's why, IMHO, building down from
AbstractSet
would make sense not just technically, but also semantically.
To be fair, many packages out there that define triangles and rectangles do not actually have a definition of in
. The type represents the concept of the set, but really just stores the data to define it (and may have some other supertype). The domain interface at least enables someone else (crucially not per se the package developers themselves) to add the interpretation of a set, and allows interoperability with other objects having the same interpretation. Having an AbstractSet
with a design elaborate enough to satisfy many use cases might be very nice but will never cover everything, something else is needed anyway.