plum fix: different dispatch instances on same function

Jul 20 '24 16:07 nstarman

Pull Request Test Coverage Report for Build 10133091731

Details

54 of 66 (81.82%) changed or added relevant lines in 1 file are covered.
No unchanged relevant lines lost coverage.
Overall coverage decreased (-0.9%) to 98.871%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
plum/dispatcher.py	54	66	81.82%
<!--	Total:	54	66

Totals
Change from base Build 9969956439:	-0.9%
Covered Lines:	1314
Relevant Lines:	1329

💛 - Coveralls

Jul 20 '24 16:07 coveralls

Hey @nstarman!

Thanks for creating a PR. :) I'm not entirely sure about this change. For a function, function._f corresponds to the method that was given whenever the function was created. Here's an example:

from number import Number


@dispatch.abstract
def add(x: Number, y: Number):
    """Add two numbers.

    Generic docstring.
    """


@dispatch
def add(x: int, y: int):
    return x + y


# At this point, `add._f` is the function given in the abstract implementation.

I'm therefore worried that this might result in unexpected behaviour where people expect the attribute _f to be something that it isn't.

In general, I think passing a function to @dispatch is not correct. Instead, one should use @dispatch to define methods. Hopefully this makes sense, but what I'm trying to say is that a function is a generic function name, like add, and a method is a specific implementation of a function with a type signature, like add(x: int, y: int). In particular, a function does not have a type signature, because it chooses which signature to use whenever the function is called (the process of dispatching). Therefore, giving a function to @dispatch wouldn't be right, because there is no well-defined type signature.

Instead, I'm thinking that this instead we should give an error. What would you think about this?

Jul 22 '24 14:07 wesselb

I'm not sure I see what you mean. This PR is about making it possible to do

@dispatch2
@dispatch1
def f(x: int, y: float):
   return x + y

where dispatch1 and dispatch2 are different instances of Dispatcher.

~I don't think this approach suffers from the problem you outline. The following should (correct me if I'm wrong) find the correct method.~

I did discover a fault. Adding in two dispatch.abstract in the test suite...

dispatch1 = Dispatcher()
dispatch2 = Dispatcher()

@dispatch1.abstract
def f(x: Number, y: Number):
    return x - 2 * y

@dispatch2.abstract
def f(x: Number, y: Number):
    return x - y

@dispatch2
@dispatch1
def f(x: int, y: float):
    return x + y

@dispatch1
def f(x: str):
    return x

ns1 = SimpleNamespace(f=f)

@dispatch2
def f(x: int):
    return x

ns2 = SimpleNamespace(f=f)

assert ns1.f("a") == "a"
assert ns1.f(1, 1.0) == 2.0

assert ns2.f(1) == 1
assert ns2.f(1, 1.0) == 2.0

Means that method, _ = method.resolve_method(signature) raises an error because it's trying to find the abstract method. What's a better way of getting dispatch2 to see the correct method (function with specific signature) when the dispatches are nested?

Jul 25 '24 17:07 nstarman

@nstarman What would you think of changing the pattern to something like this?

@dispatch1.chain(dispatch2)
def f(x: int, y: float):
    return x + y

Then, internally, the chain method would simply call dispatch1(f) and dispatch2(f), so there wouldn't be any unwrapping required. It's slightly less nice syntax wise, but perhaps acceptable?

Jul 26 '24 14:07 wesselb

Interesting! Or what about having a DispatcherBundle class such that it's


@(dispatch1 | dispatch2)
def func(...): ...

Where

class AbstractDispatcher: ...

class Dispatcher(AbstractDispatcher):
    def __or__(self, other: Dispatcher) -> DispatcherBundle: ...

class DispatcherBundle(AbstractDispatcher):
    def __call__(...):
        for dispatcher in self:
            ...

I think the syntax is more Pythonic. And then the bundle object is reusable and looks like the regular dispatcher. Moving most of the logic to a common base class would mean this is a smallish code addition!

Alternatively the or operator could just return a call chain function, though this prevents chaining e.g multi.

Jul 26 '24 16:07 nstarman

@nstarman Aha, I like that even better, since now the syntax is symmetric and indeed more Pythonic! Would be very happy to go for that. :)

Jul 26 '24 16:07 wesselb

Just need to add a guard for python 3.8 since @(...) syntax is not supported until py3.9+. As a side note, when were you planning to drop 3.8 support?

Jul 28 '24 16:07 nstarman

If I can share a thought about DispatcherBundle and the double registration of functions...

I would caution against including such features into plum itself, and would rather refactor and open up the internals in a reasonable way that makes it easy to build such things on top organically and without fragmenting the ecosystem.

One of the reasons is that multiple dispatch in python is already somewhat 'weird' and not native. Plum is trying to make it possible and reasonable, but there's already a long way to go to get there. While other languages offer great inspiration and carefully crafted ideas or frameworks, we lack a coherent view on how to get a 'reasonable MD that works and is intuitive' in Python.

However, I do not think that adding many features to plum itself is the best way to get there. What is proposed in this PR is an exotic use case that is intricated. Already MD is 'hard to understand' to the average python user, and dispatcher bundles seem to me an even more complicated concept never seen in another language and that hides intent even more! While I understand that it might make sense in your particular project* , I do not think that those untested ideas belong to plum itself, where they might confuse users even more and push them in directions I personally do not think we should endorse.

At the same time, I would like plum to foster experimentation of such 'crazy ideas', because maybe at some point we might land on the 'right approach' to MD in Python. To do that, I think that a reasonable part of plum's internal should be expandable and reusable such that you'd be able to build this feature in your own library, on top of plum.

In short, I think this is a strong idea about how MD should work, going in a step that does not belong to MD (multiple dispatchers for the same functions).

I really have never seen such an exotic idea and I think that if it was never attempted, there might be a reason. I would strongly suggest to think twice about such a complicated approach and try to take a few steps back and solve it differently.

Jul 28 '24 22:07 PhilipVinc

@PhilipVinc, you make some solid arguments! @nstarman, what do you think?

In this case, I think that the functionality could be fairly easily implemented with a third-party function resulting in something like this:

@chain_dispatchers(dispatch1, dispatch2)
def f(…):
    …

@nstarman, would you also be happy with a solution like this, or do you think this functionality should belong in Plum? If the latter, possibly we could organise more experimental functions that are not to be used by the average user in e.g. plum.experimental, unless @PhilipVinc thinks that we should consider doing not even that.

Jul 30 '24 15:07 wesselb

I think @PhilipVinc raises good points! Without language changes, MD in Python will be implemented by on and by objects of which there can be multiple instances. I agree this is not ideal (and hopefully not too common) and that we don't necessarily want to add functionality to make this more visible. I'm happy for chain_dispatchers to be in my 3rd party code, but IMO given that this MD is in Python and done via objects, functionality for working with multiple of those objects would be good to have in a plum.experimental.

Jul 30 '24 15:07 nstarman

Happy to go with plum.experimental if @PhilipVinc is also happy with that!

Jul 30 '24 16:07 wesselb

Experimental is ok for me.

However I must say I disagree with this vision that 'MD' is implemented through 'objects'. That's just an implementation detail in my eyes, and we could easily make the Dispatcher a singleton.

In fact, a question I never asked @wesselb is: why is not a singleton? What was the use case you had in mind for multiple dispatchers?

I fail to see in which case it would be useful to have multiple dispatcher objects, and in particular why you want to use it with the same functions?

For my own curiosity, could you let me know what is your use case?

Jul 30 '24 17:07 PhilipVinc

@PhilipVinc. For me this is to support the multiple different versions of the same function that exist in NumPy and other array libraries during the Array API transition. For example jax.numpy.arange and jax.experimental.array_api have different signatures and requirements. See unxt and quaxed/array_api and quaxed/numpy This specific situation will eventually be resolved once array-api is fully adopted.

Jul 30 '24 18:07 nstarman

@PhilipVinc Multiple dispatchers are required to emulate something like how scoping works in Julia. For example, in Julia, if package A defines a function f but does not export it, then you can define your own function f without the methods defined in A. Only once A exports f become these methods visible in your local scope. With a singleton dispatcher, all methods for function f would be visible to everyone and you could accidentally override methods or extend functions that weren’t supposed to be public.

@nstarman If chain_dispatchers is only required temporarily, maybe its better to define it in your package. If you think it’s useful long term, then plum.experimental sounds great. :)

Jul 30 '24 20:07 wesselb