WIP: Draft iterable hooks implementation
Example/untested draft implementation for discussion in #50. Ping @fschulze too.
Just added a more correct/complete integration with the manager and a test to verify.
Still rough but a starting point nonetheless.
I don't think I'm a big fan of having two sets of hooks pm.hook and pm.ihook I'd rather just have an explicit mark with indicates whether the hook is iterable or not but maybe it's good to have both?
Also what about supporting hooks as generators? Bad idea? We'd probably only support it for py3 anyway since we can use yield from.
@tgoodlet nice, thanks! This is a good starting point.
I don't think I'm a big fan of having two sets of hooks pm.hook and pm.ihook I'd rather just have an explicit mark with indicates whether the hook is iterable or not but maybe it's good to have both?
Currently hook calls either return a list (normal hooks) or a single value (hooks marked as firstresult), correct? I don't see the value of having hook and ihook, I believe we should change the semantics of the first case, normal hooks will always return an iterator. It is easy enough to call list(it) if one needs a list.
@nicoddemus so techincally we could jig this such that pluggy.callers._multicall() is always a generator and if a user wants a non-iterable call we just do:
return next(_multicall())
where internally there's just an if iterate: which skips the yield line.
But then regular calls will be slower. However, if all of this is cythonized it may not matter.
I believe we should change the semantics of the first case, normal hooks will always return an iterator. It is easy enough to call list(it) if one needs a list.
Yeah only problem is that's a low slower.
if a user wants a non-iterable call we just do
Hmm sorry what you mean, could you clarify?
But then regular calls will be slower
Oh is this the reason why we are having both hook and ihook since the beginning?
if a user wants a non-iterable call we just do Hmm sorry what you mean, could you clarify?
We could theoretically make _multicall and _itercall the same function (which contains a yield making it a generator function) and simply skip that yield expression if the iterate flag is False. If you want to call a generator function and have it behave the same as a regular function you'd need to do next(gen_func(*args)) but this is about 4-5 times slower then a regular function call (at least according to my micro benches in py3.6.3).
Oh is this the reason why we are having both hook and ihook since the beginning?
Precisely.
But what I'm thinking now is that maybe we can just cythonize the function and have it remain a single definition, although I haven't ever tried using yield in a cython func.
That being said, regarding cython, I'd always thought a cythonized call loop would be an optional dependency.
Precisely.
OK thanks for confirming! 👍
but this is about 4-5 times slower then a regular function call (at least according to my micro benches in py3.6.3).
This is really strange. I remember your benchmark and it seemed correct.
I did a quick benchmark unrelated to pluggy, just to show the speed difference between constructing a list vs using a generator:
def f():
r = []
for i in range(100):
r.append(i)
return r
def g():
for i in range(100):
yield i
X:\>python --version
Python 3.6.1 :: Continuum Analytics, Inc.
X:\>python -m timeit -s "from perf import f, g" "f()"
100000 loops, best of 3: 6.79 usec per loop
X:\>python -m timeit -s "from perf import f, g" "for i in g(): pass"
100000 loops, best of 3: 4.59 usec per loop
And this results match my expectations. Perhaps is there something else in pluggy's code that might be causing this difference?
What if we time the two approaches to measure them? We can use the code from your test as basis:
from pluggy import PluginManager, HookspecMarker, HookimplMarker
hookspec = HookspecMarker("example")
hookimpl = HookimplMarker("example")
pm = PluginManager("example")
class Hooks(object):
@hookspec
def he_method1(self, arg):
pass
pm.add_hookspecs(Hooks)
class Plugin1(object):
@hookimpl
def he_method1(self, arg):
pass
class Plugin2(object):
@hookimpl
def he_method1(self, arg):
pass
class Plugin3(object):
@hookimpl
def he_method1(self, arg):
pass
class Plugin4(object):
@hookimpl
def he_method1(self, arg):
pass
class PluginWrapper(object):
@hookimpl(hookwrapper=True)
def he_method1(self, arg):
yield
pm.register(Plugin1())
pm.register(Plugin2())
pm.register(Plugin3())
pm.register(Plugin4())
pm.register(PluginWrapper())
X:\>python -m timeit -s "from perf2 import pm" "for x in pm.hook.he_method1(arg=None): pass"
100000 loops, best of 3: 8.19 usec per loop
Duh, why don't I do it myself? 😁
Forked your branch and get nearly the same timings:
{env35} X:\>python -m timeit -s "from perf2 import pm" "for x in pm.hook.he_method1(arg=None): pass"
100000 loops, best of 3: 8.15 usec per loop
{env35} X:\>python -m timeit -s "from perf2 import pm" "for x in pm.ihook.he_method1(arg=None): pass"
100000 loops, best of 3: 8.28 usec per loop
Unless I'm missing something, their performance is nearly identical, so IMHO we should go to just change hook to always return an iterable.
@nicoddemus ahh yeah I was testing it the opposite way...
I was benching calling a generator which skips yielding as opposed to just embracing it and capturing the output in a list. Your way is better.
The only thing I wonder is how does it perform under different numbers of hooks/wrappers as in the benchmark test suite. I think we should try that out first while there's separate implementations.
You can make the changes in testing/benchmark.py if you're so bold ;)
i would like to bring attention to the fact that hookwrappers are unable to semantically correctly operate if we stream partial results out without giving them a chance to operate/alter the stream
i would like to bring attention to the fact that hookwrappers are unable to semantically correctly operate if we stream partial results out without giving them a chance to operate/alter the stream
Hmm indeed that is a problem, because the post-hookwrappers can't alter the outcome, because the outcome has already been processed by the caller.
Not sure what we should do here, it seems to be a big blocker.
i beleive for this feature we have the choice between massive cost in terms of conceptual complexity for implementation or glaring conceptual holes in the api
i leaning towards avoiding it due to inconsistency
Hmm indeed that is a problem, because the post-hookwrappers can't alter the outcome, because the outcome has already been processed by the caller.
@RonnyPfannschmidt @nicoddemus yes agreed. This is why I was implementing it separate from the main hook call set. I think @RonnyPfannschmidt is right you can't combine to the 2 ideas which makes me wonder if the only way we could support it is by having seperately marked iterable hooks which are also called from a different _HookRelay. We can support iterable hooks but wrappers in that case aren't a thing?
For the use cases I have in mind, making the hook explicitly iterable would be fine, if not even preferred. If such hooks then don't support hookwrappers would be fine by me.
From a user standpoint, we would mark it as:
@hookspec(iterable=True)
def myproject_get_password(self, arg1, arg2):
"""
"""
?
@nicoddemus yeah I think that could work.
The question is do we just call such hooks the same way as others - pm.hook.myproject_get_password()? I guess that's fine yeah?
@nicoddemus, @RonnyPfannschmidt @fschulze just had a convo with @vodik about this and I think we could make an alternative version of what a wrapper is for iterable hooks that's even more powerful.
Instead of having setup and teardown before and after all hookimpl calls are complete, instead define a wrapper as more of an intercepting generator.
That is a hookwrapper is a generator which looks as follows:
@hookimpl(wrapper=True):
def myproject_get_password(arg1, arg2):
iter_results = yield
for result in iter_results:
# intercept and do something with each result
yield result # delivers the hookimpl call result to the hook caller (or next hookwrapper in line)
# do something after the result has been delivered but before the next
# hook call has been invoked inside _itercall
Here iter_results is the generator returned from _itercall().
We could also do some even more interesting stuff in py3.3+. We can actually support the original wrapper semantics in a cleaner way with yield from:
@hookimpl(wrapper=True):
def myproject_get_password(arg1, arg2):
iter_results = yield
# do the normal pre-call stuff
try:
results = yield from iter_results # yields every hookcall upwards to caller; returns full list of yielded results
except BaseException as err:
if raise_on_err:
raise err
else:
# do the normal post-call stuff
results.clear()
results.append('my_override_result')
return results # this could still be used for the normal (non-iterable) hook calls
Which has the extra nice nicety that we really wouldn't need the pluggy.callers._Result type any more.
We of course would offer these types of iterable hooks (and their corresponding wrappers) alongside the existing API and could encourage migration if we like the new version better?
Let me know what you guys think!
it has a simple question - what does this do to python2.7
@tgoodlet about your first example, when you say yield result, what does the caller (which is calling the wrapper) does with that result? Replace the original result?
We of course would offer these types of iterable hooks (and their corresponding wrappers) alongside the existing API and could encourage migration if we like the new version better?
At first it does seem the new version is more flexible but I think it is a little more complex to write/understand, so I'm not sure it is better. It might be just a matter of getting used to it though. Don't know, have to think about it a little better.
it has a simple question - what does this do to python2.7
IIRC it should work the same, except yield from would have to be replaced by an explicit loop. Instead of:
results = yield from iter_results
In py27 one has to write:
results = []
for result in iter_results:
yield result
results.append(result)
@tgoodlet please correct me if I'm wrong.
your first example, when you say yield result, what does the caller (which is calling the wrapper) does with that result? Replace the original result?
@nicoddemus yes exactly. So a wrapper is able to intercept each value before delivered to the caller. This provides a way to execute before/after each iteration as well as modify the value or, not deliver it at all.
For example the wrapper could iterarate all hook calls but not deliver any results up to the caller until it's prepossessed them:
@hookimpl(wrapper=True):
def myproject_get_password(arg1, arg2):
iter_results = yield
results = []
for result in iter_results:
results.append(result)
# do some checks over all results
if 'myresult' in results:
pass
else:
results.append('myresult')
# do the preprocessing of each result
for result in filter(preprocess, results):
yield result # finally delivered to caller
At first it does seem the new version is more flexible but I think it is a little more complex to write/understand, so I'm not sure it is better.
hookimpl still are iterated the same way. The only difference is that wrappers now must be implemented as generator functions.
IIRC it should work the same, except yield from would have to be replaced by an explicit loop.
Yes of course. I was just pointing out how nice it would look. Also, there's some optimizations python does with yield from which are pretty slick. But yes of course you can do the loop in py2.7.
@tgoodlet as far as i can tell thats a glefully fractioned controll flow using the same structural mechanism for absolutely distinct meanings, im reasonably sure this will break people necks from partial/complete misuse
im pretty opposed to is as of now - if you cant demonstrate a reasonable api for the hookwrappers, just disallow them for now
from my pov this cant possibly have a reasonable api without async await/async for/
I agree with @RonnyPfannschmidt, my gut feeling is that this is creating a complex control flow which might be problematic in the future (it is a fun thought exercise though).
As @RonnyPfannschmidt said, I think we should go ahead with the original idea of iterable hook calls but without hookwrapper support; if we come up with a nice solution in the future, it is just a matter to allow hookwrappers from that point on.
@nicoddemus @RonnyPfannschmidt cool I'll see if I can finalize it this week.
Hey guys I'll crank this out over the weekend but I need #101 to land first to avoid conflicts.