peartree
peartree copied to clipboard
[performance] feed_to_graph_path is slow on larger feeds
test_feed_to_graph_path
itself is the slowest test by far. Create benchmarks and identify which steps are slowest. Find ways to speed up operations and get graph creation process to be as fast as possible.
Addressed (but still slow) via https://github.com/kuanb/peartree/pull/14
Used snakeviz with cProfile and this is what the breakdown on performance of the operation looks like at present:
generate_edge_and_wait_values
is the real hog here. It is primarily comprised of two steps:
-
generate_wait_times
(60% of the runtime of parent function) -
linearly_interpolate_infill_times
(20% of the runtime of parent function)
Both are executing Pandas functions so, beneath them, are just Pandas ops
and groupby
functions, respectively. To speed this module up, I'll need to better manage the Pandas operations and identify optimizations I can make on how I am using the Pandas operations in the logic.
For example, since these are all wrapped in a single route iteration, the whole operation is embarrassingly parallelizable.
Parallelization with performant pickling enabled via https://github.com/kuanb/peartree/issues/12
Noticing the unaccounted for stop id management step is taking quite a while:
Some unaccounted for stop ids. Resolving 2457...
^ Example from LA Metro GTFS zip file.
On smaller feeds (or even mid-sized feeds, like AC Transit), MP is slower. I need to figure out how to intelligently navigate away from using MP in these situations.
Sigh, this whole performance issue is not good.
Example:
%%time
st = time.time()
G_orig = pt.load_feed_as_graph(feed, start, end)
et = time.time()
# Runtime
print(round(et-st, 2))
Above run once with MP as False and one time as True.
No MP:
238.4
CPU times: user 3min 57s, sys: 350 ms, total: 3min 57s
Wall time: 3min 58s
Yes MP:
286.01
CPU times: user 1min 13s, sys: 390 ms, total: 1min 14s
Wall time: 4min 46s
Huge performance gain found right here: https://github.com/kuanb/peartree/issues/87
(Thank you @yiyange)
Updated performance, with the last few updates incorporates (see all commits from Wed to today):
Without MP: 87.5s (63.3% faster) With MP: 93.97s (67% faster)
cc @yiyange
I am curious in what cases using multi-processing is faster; when i played with it, it is much slower than without using it.
There is a higher initialization cost to using multiprocessing. The gains can be seen primarily on larger datasets, such as LA Metro. I should bench mark that.
Whoops sorry didn't mean to close.
LA Metro (without digging around for the exact numbers) used to take 12-15 minutes.
It now takes: Without MP: 231s With MP: 229s
So, no observable improvement. Of course, it's running in a Docker environment that only has access to 2 CPUs on my '16 Macbook Pro. A better test would be to use a virtual machine on AWS / GCloud or wherever and see what gains are achieved there.
That said, we can observe that there are pretty limited (essentially no observable) gains to be had by MP for the typical user/use case (local machine, in a Notebook like environment). This is something that should be addressed long term.