activitysim icon indicating copy to clipboard operation
activitysim copied to clipboard

potential performance improvements

Open bstabler opened this issue 4 years ago • 6 comments
trafficstars

This issue is for keeping track of potential performance improvement ideas:

  • reduce expression solving time and memory needs by better handling string data as pandas categorical data
  • improve parallelization by taking advantage of updates to Python 3’s multiprocessing library
  • continue to improve chunksize calculations for more optimized multiprocessing setups
  • review ct-ramp and daysim performance ideas

Please add other ideas, thanks

bstabler avatar Feb 04 '21 05:02 bstabler

A configuration file switch that can disable trip-level processing for tours based on tour mode. So, you can shut off (skip) stop_frequency, trip_purpose, trip_destination, trip_scheduling, and trip_mode_choice for walk and bike tours if you don't care about those trips (e.g. inside a global feedback loop iteration, I don't care about walk or bike trips as they don't impact congestion).

Bonus points: the ability to easily flop the switch the other way, and re-start only the filtered tours (e.g. I decided I finished doing all my global feedback loops and I want those non-motorized trips back now)

jpn-- avatar Feb 05 '21 02:02 jpn--

@stefancoe - add reading skim data from disk on-demand as opposed to reading every skim into RAM at the start as a way to trade runtime for RAM. @toliwaga implemented an undocumented version of this during the TVPB caching research and it runs slower but uses a lot less RAM. We may want to complete this feature for general use.

bstabler avatar Feb 10 '21 00:02 bstabler

Some more ideas from discussions with SANDAG:

  • Move from strings to factors
  • Exponentiate ahead of time TAP to TAP utilities, along with pre-computing access/egress costs
  • Smarter binary search / picking of an alternative from a large choice set (such as for location choice)
  • Make trip destination (i.e. intermediate stop location choice) aware of the tour mode so: o For bike, walk, transit to reduce the set of possible mazs ahead of time o For auto, to pre-compute TAZ to TAZ total utilities to avoid duplication of calculations
  • Smarter chunking calculations to get more throughput #406
  • Continued expression review/tidying up to reduce redundancy of calculations (i.e. optimization of written expressions)
  • Buy a bigger / faster server and test ahead of time in the cloud what’s possible with respect to runtime reductions

bstabler avatar Apr 27 '21 15:04 bstabler

Some good ideas here to increase pandas performance. The Pandas eval function looks interesting. Could it replace/substitute python eval in some cases?

stefancoe avatar Apr 27 '21 16:04 stefancoe

Could it replace/substitute python eval in some cases?

Not that we couldn't do it more, but we're already using pandas.eval in several places, for example:

  • https://github.com/ActivitySim/activitysim/blob/bcdc7b63d4ff7bc2703810e226090c75c380bda4/activitysim/core/interaction_simulate.py#L146
  • https://github.com/ActivitySim/activitysim/blob/bcdc7b63d4ff7bc2703810e226090c75c380bda4/activitysim/core/simulate.py#L443

jpn-- avatar Apr 27 '21 17:04 jpn--

Oh good to know-Thanks for pointing that out!

stefancoe avatar Apr 27 '21 17:04 stefancoe