pluggy WIP: Draft iterable hooks implementation

Example/untested draft implementation for discussion in #50. Ping @fschulze too.

Nov 12 '17 18:11 goodboy

Just added a more correct/complete integration with the manager and a test to verify.

Still rough but a starting point nonetheless. I don't think I'm a big fan of having two sets of hooks pm.hook and pm.ihook I'd rather just have an explicit mark with indicates whether the hook is iterable or not but maybe it's good to have both?

Also what about supporting hooks as generators? Bad idea? We'd probably only support it for py3 anyway since we can use yield from.

Nov 12 '17 23:11 goodboy

@tgoodlet nice, thanks! This is a good starting point.

I don't think I'm a big fan of having two sets of hooks pm.hook and pm.ihook I'd rather just have an explicit mark with indicates whether the hook is iterable or not but maybe it's good to have both?

Currently hook calls either return a list (normal hooks) or a single value (hooks marked as firstresult), correct? I don't see the value of having hook and ihook, I believe we should change the semantics of the first case, normal hooks will always return an iterator. It is easy enough to call list(it) if one needs a list.

Nov 12 '17 23:11 nicoddemus

@nicoddemus so techincally we could jig this such that pluggy.callers._multicall() is always a generator and if a user wants a non-iterable call we just do: return next(_multicall()) where internally there's just an if iterate: which skips the yield line.

But then regular calls will be slower. However, if all of this is cythonized it may not matter.

I believe we should change the semantics of the first case, normal hooks will always return an iterator. It is easy enough to call list(it) if one needs a list.

Yeah only problem is that's a low slower.

Nov 12 '17 23:11 goodboy

if a user wants a non-iterable call we just do

Hmm sorry what you mean, could you clarify?

But then regular calls will be slower

Oh is this the reason why we are having both hook and ihook since the beginning?

Nov 12 '17 23:11 nicoddemus

if a user wants a non-iterable call we just do Hmm sorry what you mean, could you clarify?

We could theoretically make _multicall and _itercall the same function (which contains a yield making it a generator function) and simply skip that yield expression if the iterate flag is False. If you want to call a generator function and have it behave the same as a regular function you'd need to do next(gen_func(*args)) but this is about 4-5 times slower then a regular function call (at least according to my micro benches in py3.6.3).

Oh is this the reason why we are having both hook and ihook since the beginning?

Precisely. But what I'm thinking now is that maybe we can just cythonize the function and have it remain a single definition, although I haven't ever tried using yield in a cython func.

That being said, regarding cython, I'd always thought a cythonized call loop would be an optional dependency.

Nov 12 '17 23:11 goodboy

Precisely.

OK thanks for confirming! 👍

but this is about 4-5 times slower then a regular function call (at least according to my micro benches in py3.6.3).

This is really strange. I remember your benchmark and it seemed correct.

I did a quick benchmark unrelated to pluggy, just to show the speed difference between constructing a list vs using a generator:

def f():
    r = []
    for i in range(100):
        r.append(i)
    return r

def g():
    for i in range(100):
        yield i

X:\>python --version
Python 3.6.1 :: Continuum Analytics, Inc.

X:\>python -m timeit -s "from perf import f, g" "f()"
100000 loops, best of 3: 6.79 usec per loop

X:\>python -m timeit -s "from perf import f, g" "for i in g(): pass"
100000 loops, best of 3: 4.59 usec per loop

And this results match my expectations. Perhaps is there something else in pluggy's code that might be causing this difference?

Nov 13 '17 00:11 nicoddemus

What if we time the two approaches to measure them? We can use the code from your test as basis:

from pluggy import PluginManager, HookspecMarker, HookimplMarker

hookspec = HookspecMarker("example")
hookimpl = HookimplMarker("example")

pm = PluginManager("example")

class Hooks(object):
    @hookspec
    def he_method1(self, arg):
        pass

pm.add_hookspecs(Hooks)


class Plugin1(object):
    @hookimpl
    def he_method1(self, arg):
        pass

class Plugin2(object):
    @hookimpl
    def he_method1(self, arg):
        pass

class Plugin3(object):
    @hookimpl
    def he_method1(self, arg):
        pass

class Plugin4(object):
    @hookimpl
    def he_method1(self, arg):
        pass

class PluginWrapper(object):
    @hookimpl(hookwrapper=True)
    def he_method1(self, arg):
        yield

pm.register(Plugin1())
pm.register(Plugin2())
pm.register(Plugin3())
pm.register(Plugin4())
pm.register(PluginWrapper())

X:\>python -m timeit -s "from perf2 import pm" "for x in pm.hook.he_method1(arg=None): pass"
100000 loops, best of 3: 8.19 usec per loop

Nov 13 '17 01:11 nicoddemus

Duh, why don't I do it myself? 😁

Forked your branch and get nearly the same timings:

{env35} X:\>python -m timeit -s "from perf2 import pm" "for x in pm.hook.he_method1(arg=None): pass"
100000 loops, best of 3: 8.15 usec per loop

{env35} X:\>python -m timeit -s "from perf2 import pm" "for x in pm.ihook.he_method1(arg=None): pass"
100000 loops, best of 3: 8.28 usec per loop

Unless I'm missing something, their performance is nearly identical, so IMHO we should go to just change hook to always return an iterable.

Nov 13 '17 01:11 nicoddemus

@nicoddemus ahh yeah I was testing it the opposite way... I was benching calling a generator which skips yielding as opposed to just embracing it and capturing the output in a list. Your way is better.

The only thing I wonder is how does it perform under different numbers of hooks/wrappers as in the benchmark test suite. I think we should try that out first while there's separate implementations. You can make the changes in testing/benchmark.py if you're so bold ;)

Nov 13 '17 04:11 goodboy

i would like to bring attention to the fact that hookwrappers are unable to semantically correctly operate if we stream partial results out without giving them a chance to operate/alter the stream

Nov 13 '17 06:11 RonnyPfannschmidt

i would like to bring attention to the fact that hookwrappers are unable to semantically correctly operate if we stream partial results out without giving them a chance to operate/alter the stream

Hmm indeed that is a problem, because the post-hookwrappers can't alter the outcome, because the outcome has already been processed by the caller.

Not sure what we should do here, it seems to be a big blocker.

Nov 13 '17 09:11 nicoddemus

i beleive for this feature we have the choice between massive cost in terms of conceptual complexity for implementation or glaring conceptual holes in the api

i leaning towards avoiding it due to inconsistency

Nov 13 '17 09:11 RonnyPfannschmidt

Hmm indeed that is a problem, because the post-hookwrappers can't alter the outcome, because the outcome has already been processed by the caller.

@RonnyPfannschmidt @nicoddemus yes agreed. This is why I was implementing it separate from the main hook call set. I think @RonnyPfannschmidt is right you can't combine to the 2 ideas which makes me wonder if the only way we could support it is by having seperately marked iterable hooks which are also called from a different _HookRelay. We can support iterable hooks but wrappers in that case aren't a thing?

Nov 13 '17 18:11 goodboy

For the use cases I have in mind, making the hook explicitly iterable would be fine, if not even preferred. If such hooks then don't support hookwrappers would be fine by me.

Nov 13 '17 19:11 fschulze

From a user standpoint, we would mark it as:

    @hookspec(iterable=True)
    def myproject_get_password(self, arg1, arg2):
        """
        """

?

Nov 13 '17 20:11 nicoddemus

@nicoddemus yeah I think that could work. The question is do we just call such hooks the same way as others - pm.hook.myproject_get_password()? I guess that's fine yeah?

Nov 14 '17 01:11 goodboy

@nicoddemus, @RonnyPfannschmidt @fschulze just had a convo with @vodik about this and I think we could make an alternative version of what a wrapper is for iterable hooks that's even more powerful.

Instead of having setup and teardown before and after all hookimpl calls are complete, instead define a wrapper as more of an intercepting generator.

That is a hookwrapper is a generator which looks as follows:

@hookimpl(wrapper=True):
def myproject_get_password(arg1, arg2):
    iter_results = yield
    for result in iter_results:
        # intercept and do something with each result
        yield result   # delivers the hookimpl call result to the hook caller (or next hookwrapper in line)
        # do something after the result has been delivered but before the next 
        # hook call has been invoked inside _itercall

Here iter_results is the generator returned from _itercall().

We could also do some even more interesting stuff in py3.3+. We can actually support the original wrapper semantics in a cleaner way with yield from:

@hookimpl(wrapper=True):
def myproject_get_password(arg1, arg2):
    iter_results = yield
    # do the normal pre-call stuff
    try:
        results = yield from iter_results  # yields every hookcall upwards to caller; returns full list of yielded results
    except BaseException as err:
        if raise_on_err:
            raise err
    else:
        # do the normal post-call stuff
        results.clear()
        results.append('my_override_result')
        return results  # this could still be used for the normal (non-iterable) hook calls

Which has the extra nice nicety that we really wouldn't need the pluggy.callers._Result type any more. We of course would offer these types of iterable hooks (and their corresponding wrappers) alongside the existing API and could encourage migration if we like the new version better?

Let me know what you guys think!

Nov 14 '17 03:11 goodboy

it has a simple question - what does this do to python2.7

Nov 14 '17 08:11 RonnyPfannschmidt

@tgoodlet about your first example, when you say yield result, what does the caller (which is calling the wrapper) does with that result? Replace the original result?

We of course would offer these types of iterable hooks (and their corresponding wrappers) alongside the existing API and could encourage migration if we like the new version better?

At first it does seem the new version is more flexible but I think it is a little more complex to write/understand, so I'm not sure it is better. It might be just a matter of getting used to it though. Don't know, have to think about it a little better.

it has a simple question - what does this do to python2.7

IIRC it should work the same, except yield from would have to be replaced by an explicit loop. Instead of:

results = yield from iter_results

In py27 one has to write:

results = []
for result in iter_results:
    yield result
    results.append(result)

@tgoodlet please correct me if I'm wrong.

Nov 14 '17 09:11 nicoddemus

your first example, when you say yield result, what does the caller (which is calling the wrapper) does with that result? Replace the original result?

@nicoddemus yes exactly. So a wrapper is able to intercept each value before delivered to the caller. This provides a way to execute before/after each iteration as well as modify the value or, not deliver it at all.

For example the wrapper could iterarate all hook calls but not deliver any results up to the caller until it's prepossessed them:

@hookimpl(wrapper=True):
def myproject_get_password(arg1, arg2):
    iter_results = yield
    results = []
    for result in iter_results:
        results.append(result)
    # do some checks over all results
    if 'myresult' in results:
        pass
    else:
        results.append('myresult')
    # do the preprocessing of each result
    for result in filter(preprocess, results):
        yield result   # finally delivered to caller

At first it does seem the new version is more flexible but I think it is a little more complex to write/understand, so I'm not sure it is better.

hookimpl still are iterated the same way. The only difference is that wrappers now must be implemented as generator functions.

IIRC it should work the same, except yield from would have to be replaced by an explicit loop.

Yes of course. I was just pointing out how nice it would look. Also, there's some optimizations python does with yield from which are pretty slick. But yes of course you can do the loop in py2.7.

Nov 14 '17 16:11 goodboy

@tgoodlet as far as i can tell thats a glefully fractioned controll flow using the same structural mechanism for absolutely distinct meanings, im reasonably sure this will break people necks from partial/complete misuse

im pretty opposed to is as of now - if you cant demonstrate a reasonable api for the hookwrappers, just disallow them for now

from my pov this cant possibly have a reasonable api without async await/async for/

Nov 14 '17 16:11 RonnyPfannschmidt

I agree with @RonnyPfannschmidt, my gut feeling is that this is creating a complex control flow which might be problematic in the future (it is a fun thought exercise though).

As @RonnyPfannschmidt said, I think we should go ahead with the original idea of iterable hook calls but without hookwrapper support; if we come up with a nice solution in the future, it is just a matter to allow hookwrappers from that point on.

Nov 14 '17 18:11 nicoddemus

@nicoddemus @RonnyPfannschmidt cool I'll see if I can finalize it this week.

Nov 14 '17 23:11 goodboy

Hey guys I'll crank this out over the weekend but I need #101 to land first to avoid conflicts.

Nov 24 '17 16:11 goodboy