Speed tests with a Stream?
I am trying to compare the speed of operation when a Stream is introduced into the code. These are both scripts that need to be run.
A 'plain' Python example:
from timeit import default_timer as timer
LIMIT = 100000
def sequencer(limit=LIMIT):
num = 0
while num < limit:
yield num
num += 1
L = []
start = timer()
for i in sequencer():
L.append(i)
end = timer()
print(end - start)
This takes about 17 milliseconds to run.
An attempt to replicate the above with streamz:
from timeit import default_timer as timer
from streamz import Stream
LIMIT = 100000
def sequencer(limit=LIMIT):
num = 0
while num < limit:
yield num
num += 1
source = Stream.from_iterable(sequencer())
L = source.sink_to_list()
start = timer()
source.start()
while True:
if len(L) >= LIMIT:
break
end = timer()
print(end - start)
This takes about 1200 milliseconds to run. Is this the expected slowdown (two orders-of-magnitude)?
I am not sure though if the streamz code is correct? Its based off of the examples in the docs, but those examples all seem geared towards use in the shell rather than in standalone scripts. If you omit the while True: section, then no data ends up in the output list?
The overhead of streamz is essentially from running async coroutines. Each task adds a small overhead to the call, so this will show up as significant in cases where the function itself is extremely fast - like this case. In all normal operation, ~10us of overhead per task would be totally negligible. You could use an event loop other than the standard asyncio one to mitigate the issue, if it actually is significant for you.
@martindurant I agree this is unlikely to be at all significant in Real World cases. I was just trying to compare streamz to an in-house streaming framework and the "do almost nothing" case was the most straightforward. Its useful to know that the async coroutines are the contributors.
(You did not comment on the use of the while loop, so I assume this is the way to go?)
You did not comment on the use of the while loop, so I assume this is the way to go?
I don't have anything against it in principle, except that repeatedly checking the list size may itself be slowing down execution (because of python's GIL). Perhaps add some short sleep? That will add a small value to the measured time, but divided between the many iterations of the loop.
I will try that.
I was just wondering if there was any other way to 'wait' for the stream to finish processing. In the console you don't need this and all the data from the source will end up in the sink; but when you run it as a script (without the while), the sink is just empty.
I probably should have ended here by saying that the tests commonly use a wait_for function, which is essentially a sleep-check loop.
In any case, I think this conversation came to a reasonable end? I wonder how well streamz stood against your in-house framework.