peartree [performance] feed_to_graph_path is slow on larger feeds

test_feed_to_graph_path itself is the slowest test by far. Create benchmarks and identify which steps are slowest. Find ways to speed up operations and get graph creation process to be as fast as possible.

Dec 21 '17 04:12 kuanb

Addressed (but still slow) via https://github.com/kuanb/peartree/pull/14

Dec 25 '17 00:12 kuanb

Used snakeviz with cProfile and this is what the breakdown on performance of the operation looks like at present:

generate_edge_and_wait_values is the real hog here. It is primarily comprised of two steps:

generate_wait_times (60% of the runtime of parent function)
linearly_interpolate_infill_times (20% of the runtime of parent function)

Both are executing Pandas functions so, beneath them, are just Pandas ops and groupby functions, respectively. To speed this module up, I'll need to better manage the Pandas operations and identify optimizations I can make on how I am using the Pandas operations in the logic.

For example, since these are all wrapped in a single route iteration, the whole operation is embarrassingly parallelizable.

Mar 28 '18 05:03 kuanb

Parallelization with performant pickling enabled via https://github.com/kuanb/peartree/issues/12

Apr 11 '18 14:04 kuanb

Noticing the unaccounted for stop id management step is taking quite a while:

Some unaccounted for stop ids. Resolving 2457...

^ Example from LA Metro GTFS zip file.

Apr 21 '18 17:04 kuanb

On smaller feeds (or even mid-sized feeds, like AC Transit), MP is slower. I need to figure out how to intelligently navigate away from using MP in these situations.

Sigh, this whole performance issue is not good.

Example:

%%time

st = time.time()
G_orig = pt.load_feed_as_graph(feed, start, end)
et = time.time()

# Runtime
print(round(et-st, 2))

Above run once with MP as False and one time as True.

No MP:

238.4
CPU times: user 3min 57s, sys: 350 ms, total: 3min 57s
Wall time: 3min 58s

Yes MP:

286.01
CPU times: user 1min 13s, sys: 390 ms, total: 1min 14s
Wall time: 4min 46s

Jun 30 '18 05:06 kuanb

Huge performance gain found right here: https://github.com/kuanb/peartree/issues/87

(Thank you @yiyange)

Jul 12 '18 01:07 kuanb

Updated performance, with the last few updates incorporates (see all commits from Wed to today):

Without MP: 87.5s (63.3% faster) With MP: 93.97s (67% faster)

cc @yiyange

Jul 15 '18 23:07 kuanb

I am curious in what cases using multi-processing is faster; when i played with it, it is much slower than without using it.

Jul 16 '18 00:07 yiyange

There is a higher initialization cost to using multiprocessing. The gains can be seen primarily on larger datasets, such as LA Metro. I should bench mark that.

Jul 16 '18 02:07 kuanb

Whoops sorry didn't mean to close.

Jul 16 '18 02:07 kuanb

LA Metro (without digging around for the exact numbers) used to take 12-15 minutes.

It now takes: Without MP: 231s With MP: 229s

So, no observable improvement. Of course, it's running in a Docker environment that only has access to 2 CPUs on my '16 Macbook Pro. A better test would be to use a virtual machine on AWS / GCloud or wherever and see what gains are achieved there.

That said, we can observe that there are pretty limited (essentially no observable) gains to be had by MP for the typical user/use case (local machine, in a Notebook like environment). This is something that should be addressed long term.

Jul 16 '18 03:07 kuanb

peartree peartree copied to clipboard

[performance] feed_to_graph_path is slow on larger feeds

peartree
peartree copied to clipboard