DomainSets.jl Getting the domain/support of objects, and domain type hierarchy

We currently don't have a common function to query the domain or support of something (a distribution, measure, ML model, etc.) yet, right? We could add something like getdomain(something) and getsupport(some_function)

For functions, the situation is more tricky since their domain/support will often not only depend on the argument type, but also on the actual shape of the argument (array sizes, etc.). We could do getdomain(f, shp::ValueShapes.AbstractValueShape) for the kind of stuff ValueShapes currently covers (could extend that).

Distributions currently offers support, but it's limited to univariate distributions. Having getdomain/getsupport in DomainSets would allow packages to add support for complex types of domains in a clean fashion without depending on DomainSets.

Would a getdomain/getsupport API be welcome in DomainSets (I could do a PR)?

(Edit: Also some discussion about domain/set type hierarchies further down.)

Nov 20 '23 09:11 oschulz

For functions, the situation is more tricky since their domain/support will often not only depend on the argument type, but also on the actual shape of the argument (array sizes, etc.)

Can you give a concrete example?

The ones I can think of (eg matrix functions with sqrt(A)) have a very complicated domain and so one would need a special type to represent the domain that could also have variable dimensions cooked in, eg for domain(sqrt, ::Symmetric{Float64,Matrix{Float64}}) would return something like SymmetricSemipositiveDefiniteDomain{Float64}().

Nov 20 '23 11:11 dlfivefifty

Distributions currently offers support, but it's limited to univariate distributions. Having getdomain/getsupport in DomainSets would allow packages to add support for complex types of domains in a clean fashion without depending on DomainSets.

Hmm, I don't quite get one you could extend a function defined in DomainSets without depending on DomainSets (since you say "without depending on"). I'm all for the idea, but how do you see this?

Nov 20 '23 11:11 daanhb

@daanhb : Hmm, I don't quite get one you could extend a function defined in DomainSets without depending on DomainSets (since you say "without depending on"). I'm all for the idea, but how do you see this?

Sorry, I should have said "without a hard dependency in it". I meant via package extensions (falling back to a direct dependency or Requires on Julia <v1.9).

Nov 20 '23 14:11 oschulz

For functions, the situation is more tricky [...] @dlfivefifty : Can you give a concrete example?

I was thinking about functions like sum, normalize, var and so on, functions that can take lots of different inputs, and are often extended by other packages to even more inputs. And then there's functions like identity that will accept anything ...

For functions like log, exp, relu, etc. it would be very easy to define getdomain(f) and getsupport(f), of course. Maybe limiting ourselves to such cases would be enough for now? Would be helpful to add domains like ℝ and ℝ₊ for that.

Nov 20 '23 14:11 oschulz

I think you need to include the type in domain and support (Btw I do not like the use of "get" as its not really used in Julia, apart from getindex)

For example, the domain of sqrt depends on whether its real or complex. Eg I'd imagine something like the following (@daanhb why is ℂ limited to ComplexF64?):

domain(::typeof(sqrt), ::T) where T<: Real = HalfLine{T}()
domain(::typeof(sqrt), ::T) where T<: Complex = ℂ / NegativeOpenHalfLine{T}()

Nov 20 '23 15:11 dlfivefifty

Btw I do not like the use of "get" as its not really used in Julia, apart from getindex

How about domainof(obj) and supportof(obj) then?

I think you need to include the type [...] For example, the domain of sqrt depends on whether its real or complex.

I had that in my initial draft of this proposal, but then I thought that maybe giving the tpe is too much and too little - for distributions and measures, we don't need it, they have a well-defined and unique domain/support. For sqrt adding the type would help, but for functions that take values that include arrays one would need both type and size (ValueShapes offers a solution here, but it' s currently limited in what it can handle).

So we could, initially, limit ourselves to cases that are clear-cut? Or we could do domainof(obj, T) and allow, e.g. for distributions and measures, something like domainof(obj) = domainof(obj, testvalue(obj))?

Nov 20 '23 15:11 oschulz

The analogue of "domain" for arrays is just called axes. What's wrong with just using domain?

Nov 20 '23 15:11 dlfivefifty

What's wrong with just using domain?

I'm not against just domain. But I think we should also have something to get support, and Distributions already "owns" support. Also, I could see people wanting to use domain as a variable name in code that deals with domains, so maybe domainof wouldn't be a bad choice?

Nov 20 '23 16:11 oschulz

I think let users use namespaces (DomainSets.domain etc) if they need to avoid confusion. Users also use size, axes, length etc. as names for variables so this isn’t a good argument

I think we could move Distributions.support and domain to a simple package (DomainsCore or something) that just defines these two functions.

Nov 20 '23 17:11 dlfivefifty

Ownership of functions is certainly an issue here. Is there any consensus about using package extensions rather than packages like DomainSetsCore or JuliaApproximationCore? I'm starting to think we could meaningfully do with both (and that support or whatever it's called eventually should go into the latter and not the former.)

Nov 20 '23 17:11 daanhb

I think we could move Distributions.support and domain to a simple package (DomainsCore or something) that just defines these two functions.

In general I like that approach, having lightweight API-only packages, I think it enables us to take full advantage of multiple dispatch across packages. In this specific case though, I'm not sure - Distributions would then actually have to have a hard dependency on DomainSets - it couldn't implement domain(d) via a Pkg extension, since that would lead to unpredictable behaviour: domain(d) could only return a result if DomainSets is loaded, but users could request domain(d) without loading it.

I guess Distributions wouldn't accept DomainSets as a dependency, because it pulls in StaticArrays, which Distributions has avoided so far.

I think let users use namespaces (DomainSets.domain etc) if they need to avoid confusion.

I agree in general, but with major packages like Distributions I think one should try to avoid name clashes.

Nov 20 '23 19:11 oschulz

are you worried about type piracy? It’s a pretty standard idiom for packages to implement functionality defined in a Core package. So donain(d) would just error and say “load DomainSets.jl” if it’s not loaded

Nov 20 '23 19:11 dlfivefifty

are you worried about type piracy? It’s a pretty standard idiom for packages to implement functionality defined in a Core package.

Yes, that wouldn't be a problem, since there would be a clear contract between "DomainSetsCore" and DomainSets.

So donain(d) would just error and say “load DomainSets.jl” if it’s not loaded

But it wouldn't always, from the user point of view: not if DomainSets is loaded implicitly via some other dependency path. And if DomainSets should vanish from that (possibly deep) dependency path, then the code that worked before would break, even if upper version bounds are used correctly. I know we have this in some corners of the ecosystem, but I don't think it's a good pattern. We do require code to state all of it's dependencies explicitly in Julia (unlike Python), and I think that has been very beneficial to the stability of the ecosystem.

Nov 20 '23 19:11 oschulz

How about this (if the Distributions maintainers are Ok with it):

We add domain and support here, DomainSets owns them.
Distributions adds support via a Pkg extension (same for MeasureBase and possibly other packages). Distributions deprecates it's own support immediately, pointing users to DomainSets.support. Distributions.support returns the same as before for the distributions where it works now (basically just univariate dists), the DomainSets.support will return the same for these.
At the next breaking release of Distributions (may be a quite a while), Distributions removes it's own support function.

CC @devmotion, I think we should ask you for some input here as well.

Nov 20 '23 19:11 oschulz

It seems Distributions.jl has its own interval type called RealInterval. That could be implemented using just IntervalSets, which is very lightweight.

Somewhere in the near future possibly both IntervalSets and DomainSets will depend on DomainSetsCore, which actually defines a function called domain for now.

Nov 27 '23 08:11 daanhb

Somewhere in the near future possibly both IntervalSets and DomainSets will depend on DomainSetsCore, which actually defines a function called domain for now.

Thanks @daanhb , that's great! Could DomainSetsCore also provide a function support or similar?

This got me thinking, though - just a proposal, I don't mean to be presumptuous: if we change a few things around anyway, should we maybe start dropping the term "domain" when it comes to the type of the sets themselves, in the future? And use terms like "domain" and "support" only in the name of functions that get such sets for objects/functions?

At the moment, we have RealLine <: FixedInterval <: TypedEndpointsInterval <: AbstractInterval <: Domain. But mathematically, the set of real numbers by itself is just a set. It is the domain of some functions, and the support of some functions, and it's the co-domain of some functions. But the real numbers, intervals, etc. are not domains intrinsically, by themselves. It's not what they are, it's a role they play in certain contexts. Maybe we could rename Domain to something like InfiniteSet at the top of the type hierarchy? It could possibly even be a subtype of Base.AbstractSet - after all, in some cases we might want to use a finite Base.Set for domains, supports and co-domains.

Nov 27 '23 10:11 oschulz

I like the name InfiniteSet. The problem with InfiniteSet <: AbstractSet is that types in Julia are more about conforming to an interface than mathematical definitions. Will an InfiniteSet actually conform to the rules governing AbstractSet? And these rules are not really written down anywhere...

Note I added support to Infinite arrays in Julia see

https://github.com/JuliaLang/julia/blob/master/test/testhelpers/InfiniteArrays.jl

So I'm all for allowing infinite sets but we just need to be more careful and possibly add a testhelper to ensure the expected functionality is maintained.

Nov 27 '23 14:11 dlfivefifty

Hmm, I did not see a discussion about names coming up here. Points, to name just one example, are not infinite sets. And if I may short-circuit the discussion, inheritance from AbstractSet seems to bring mostly complications without tangible benefits. (The "is a" relationship between objects is not a good enough reason for inheritance, in my experience so far with Julia. I agree with @dlfivefifty that similar behaviour, i.e. adhering to an interface, matters more for generic programming.)

The word domain is used in many meanings, only one of which is "the domain of a function". The word is in itself also associated with sets. Some take it to mean connected open subsets, though even then people also talk about closed domains. In other contexts one might solve a PDE "on a domain", in which case it could be nearly anything. In complex analysis there are regions (e.g. there is a ComplexRegions package). And so on. There is no single good word.

Back to the issue at hand: I think that while DomainSets could be used to describe the domain or the support of a function (which to me is a subset of its domain on which it is nonzero), it should not define the concept as the package itself is not about functions. It is a coincidence that a function with the same name "domain" is defined in DomainSetsCore. I see no conceptual inconsistency in extending the meaning of that function to mean "the domain of a function" when applied to function objects. A function like "support" on the other hand might live in JuliaApproximationCore :-)

Nov 27 '23 21:11 daanhb

How about this (if the Distributions maintainers are Ok with it):

* We add `domain` and `support` here, DomainSets owns them.

* Distributions adds support via a Pkg extension (same for MeasureBase and possibly other packages). Distributions deprecates it's own `support` immediately, pointing users to `DomainSets.support`. Distributions.support returns the same as before for the distributions where it works now (basically just univariate dists), the `DomainSets.support` will return the same for these.

* At the next breaking release of Distributions (may be a quite a while), Distributions removes it's own `support` function.

My feeling is that support (or domain or whatever its name is) is such a fundamental part of Distributions that it shouldn't be made optional and shouldn't depend on what packages users load (im- or explicitly).

Nov 27 '23 23:11 devmotion

Thanks for the clarifications @daanhb !

Hmm, I did not see a discussion about names coming up here.

No, sorry - maybe I should have opened a separate issue. In any case, I certainly didn't mean to seem pushy here.

I was just thinking about how we can express things like domains and supports better in packages like MeasureBase/MeasureTheory (they currently use some custom types). And about possible proposals in regard to Distributions to get more consistency in this area. IntervalSets + DomainSets seemed like a good fit.

In any case, from there on I just thinking about "is a" vs. "role in relationship to" in regard to the terms "domain" vs. "set" in the type hierarchy. And I saw that the Readme begins with "DomainSets.jl is a package designed to represent simple infinite sets". So I thought maybe the term "domain" in the type hierarchy had historical reasons.

I agree with @dlfivefifty that similar behaviour, i.e. adhering to an interface, matters more for generic programming

Sure, generic programming in Julia doesn't strictly need supertypes. Taking full advantage of multiple dispatch, though, in my experience, gets much easier with common supertypes (where would we be without AbstractArray?) - especially in larger (systems of) packages/applications. Just my personal view, I know this is sometimes a contentious topic. Certainly subtyping Base classes can also get tricky and result in complex dispatch decisions - I had hoped AbstractSet would be "benign" in this respect (there are some generic methods in Base that seem to assume that an AbstractSet is finite, but maybe not too many to handle?). But I agree, it's something that has to be weighed carefully.

Again, I certainly didn't mean to push too far here.

A function like "support" on the other hand might live in JuliaApproximationCore

In other contexts (InverseFunction) we've been talking about the need for more central APIs to query function properties, like maybe a "FunctionTraits" package, for functions like ismonotonic, islinear, iscontinuous and so on. Maybe such a package could be a good place for function to query domain and support as well? In contrast to questions about linearity, which have a yes/no answer, we should have a community consensus on what types should be returned here, though. Do you think it's within the scope of DomainSets to play such a role (where applicable)?

Nov 27 '23 23:11 oschulz

@devmotion : My feeling is that support (or domain or whatever its name is) is such a fundamental part of Distributions that it shouldn't be made optional and shouldn't depend on what packages users load (im- or explicitly).

Thanks for the feedback @devmotion ! Question: Are there currently any plans to extend Distributions.support to multivariate distributions (and how to represent the result)? And would you feel comfortable with support being hosted in very lightweight central API-def package?

Nov 27 '23 23:11 oschulz

@dlfivefifty: Will an InfiniteSet actually conform to the rules governing AbstractSet? And these rules are not really written down anywhere.

Good point. We could suggest defining that more closely in the language docs? :-) So far I haven't seen anything to "exotic" with AbstractSet "in the field", so maybe it's not too late ... ;-)

Note I added support to Infinite arrays in Julia see

❤️

Nov 27 '23 23:11 oschulz

In other contexts (InverseFunction) we've been talking about the need for more central APIs to query function properties, like maybe a "FunctionTraits" package, for functions like ismonotonic, islinear, iscontinuous and so on. Maybe such a package could be a good place for function to query domain and support as well? In contrast to questions about linearity, which have a yes/no answer, we should have a community consensus on what types should be returned here, though. Do you think it's within the scope of DomainSets to play such a role (where applicable)?

It would be great to see some common ground here! Apart from interfaces there is a lot of duplicated effort, including in DomainSets, especially for affine maps. I was hoping to be able to move everything related to functions and maps into a separate package (#92), or to use existing packages.

Nov 28 '23 04:11 daanhb

Sure, generic programming in Julia doesn't strictly need supertypes. Taking full advantage of multiple dispatch, though, in my experience, gets much easier with common supertypes (where would we be without AbstractArray?) - especially in larger (systems of) packages/applications. Just my personal view, I know this is sometimes a contentious topic. Certainly subtyping Base classes can also get tricky and result in complex dispatch decisions - I had hoped AbstractSet would be "benign" in this respect (there are some generic methods in Base that seem to assume that an AbstractSet is finite, but maybe not too many to handle?). But I agree, it's something that has to be weighed carefully.

Sure, dispatching on a common supertype is beneficial, as is being able to reuse code written for abstract types. I just think both of these advantages are limited in the context of domains, and we've actively been trying to get away from having a common type. (It would be a funny way to settle the eltype debate.) Yet it's good to look at the methods in Base for sets, thanks - it seems we've been missing out on opportunities to be more consistent (like using "issetequal", which seems spot on).

Nov 28 '23 04:11 daanhb

It would be great to see some common ground here! Apart from interfaces there is a lot of duplicated effort, including in DomainSets, especially for affine maps.

I'm so with you, here! I wish we have a AbstractLinearOperator <: AbstractMultiplicativeOperator, that people would agree to use, in a central place. Then we wouldn't have to use untyped arguments in algorithms (like solvers) that accept "anything that does a linear mul" all the time (resulting in later failure if users pass the wrong thing with less clear error messages, less possible specialization, and so on).

Nov 28 '23 11:11 oschulz

context of domains, and we've actively been trying to get away from having a common type

Do we really have to, though? Like @dlfivefifty says, we even have infinite arrays now - and AbstractArray make a lot of assumptions, and has a lot of generic default implementations. Maybe it would be worth a try to see if we can stretch AbstractSet to infinite sets, and other special sets, without getting into (too much) dispatch trouble? (Or have you guys been down that road before and it didn't work out, maybe?)

I'm just thinking about how far AbstractArray has carried us - we have arrays that you can only read (like mapped arrays), arrays without an efficient single-element getindex (like GPU arrays), arrays without a fixed or even finite size, and so on - but the fact that they're all AbstractArrays has enabled an incredible amount of generic code. If we had independent type hierarchies, it would be a mess of Unions, even more package extensions, and so on - just my take. That why I wonder if we shouldn't give AbstractSet a chance here, so to speak.

There are other cases where we don't have a common supertype, for good reason. Common supertypes for iterable objects for example - and we shouldn't have one, iterability isn't always the primary property of an object. So iterate is "just" a (very successful) API without a type hierarchy.

So I fully agree: Just because objects implement a common API that doesn't automatically make them candidates for a common type hierarchy. We don't have multiple inheritance or similar in the language and should choose super-types very carefully. But, to take full advantage of multiple dispatch we should build common type hierarchies where they also make sense semantically, and where we don't run into "but it's just a much an A as a B" situations.

In this case I would argue that the primary property of a set, and the domains and things we talk about here, is their, well, "set-likeness". :-) That's why, IMHO, building down from AbstractSet would make sense not just technically, but also semantically.

Nov 28 '23 11:11 oschulz

The issue is that looking at https://github.com/JuliaLang/julia/blob/master/base/abstractset.jl there's basically no code that will still work for infinite sets 😅

Here's an experiment to detect the interface:

julia> struct MySet <: AbstractSet{Int} end

julia> Set(MySet())
ERROR: MethodError: no method matching length(::MySet)

julia> Base.length(::MySet) = 1

julia> Set(MySet())
ERROR: MethodError: no method matching iterate(::MySet)

julia> Base.iterate(::MySet, k...) = iterate((1,), k...)

julia> Set(MySet())
Set{Int64} with 1 element:
  1

So the interface for AbstractSet is:

length
iterate
in (probably.... not needed in the example above)
eltype (this is automatic)

So the issue for domains is that we lose (2) and potentially (4) no longer makes sense. We could make a major PR into Julia to add overloading say intersect or union when length is infinite... I think that will be more difficult than the minimal changes I did for InfiniteArrays.jl (and there were complaints about it being merged when some Julia developers realised I had added Base.oneto without them knowing!)

We could take the ArrayLayouts.jl approach and have a parallel implementation of everything. I'm not sure I'd recommend it either.

Nov 28 '23 11:11 dlfivefifty

@dlfivefifty: there's basically no code that will still work for infinite sets 😅

Sure, but abstract types don't necessarily need to provide generic implementations/methods for the functions associated with them, right? What there should be, is contracts/semantics of the interface (and yes, we should write those down explicitly way more often :-) ).

If you build an InfiniteArray like the MySet in the example above, pretty much none of the default methods will work either. ;-)

And sure, infinite sets won't support length and iterate, but neither do infinite arrays (at least not iteration in finite time), and usually length and iteration are assumed be quite fundamental for arrays, I'd say. :-) But what all arrays that we have do support is random access: getindex (of single elements or ranges of elements). It's what makes them fundamentally different from sequences.

I would say it's very similar for sets: Finite sets have a length and support iteration, infinite sets don't. But what they all share, what makes them sets, is the ability to check if they contain an element in a (somewhat) efficient manner: Base.in. I'd say it is for sets what getindex is for arrays. Arrays have a (somewhat) efficient getindex, sets have a (somewhat) efficient in. At least that how I would define it.

And of course we'll want to support operations like union - where possible. We won't be able to implement that for all possible combinations of sets in a sensible way, but then we can't implement vcat for all possible combinations of arrays very sensibly either.

(I should actually be pretty easy to add a generic union implementation for AbstractSets to Base, just using in, and retrofit it via Compat. Generic unions could even be made iterable if their constituents are. Generic intersects would also be easy, no way to get default iteration there, though.)

(We could even define iteration for certain infinite sets, for hypercubes for example, we could use quasirandom sequences like Sobol to implement infinite iteration. That could actually be useful in practice.)

Nov 28 '23 11:11 oschulz

Do we really have to, though? Like @dlfivefifty says, we even have infinite arrays now - and AbstractArray make a lot of assumptions, and has a lot of generic default implementations. Maybe it would be worth a try to see if we can stretch AbstractSet to infinite sets, and other special sets, without getting into (too much) dispatch trouble? (Or have you guys been down that road before and it didn't work out, maybe?)

You raise good points but one can argue the other way here. It's great that the community can agree on using AbstractArray as a common supertype, but it has side-effects. Plenty of objects can be thought of as an array in a sense, yet have another more logical supertype. Images come to mind (this goes way back). Not inheriting from AbstractArray makes it hard to use the nice indexing logic that is in Base.

Nov 28 '23 15:11 daanhb

In this case I would argue that the primary property of a set, and the domains and things we talk about here, is their, well, "set-likeness". :-) That's why, IMHO, building down from AbstractSet would make sense not just technically, but also semantically.

To be fair, many packages out there that define triangles and rectangles do not actually have a definition of in. The type represents the concept of the set, but really just stores the data to define it (and may have some other supertype). The domain interface at least enables someone else (crucially not per se the package developers themselves) to add the interpretation of a set, and allows interoperability with other objects having the same interpretation. Having an AbstractSet with a design elaborate enough to satisfy many use cases might be very nice but will never cover everything, something else is needed anyway.

Nov 28 '23 15:11 daanhb

DomainSets.jl DomainSets.jl copied to clipboard

Getting the domain/support of objects, and domain type hierarchy

DomainSets.jl
DomainSets.jl copied to clipboard