activitysim
activitysim copied to clipboard
potential performance improvements
This issue is for keeping track of potential performance improvement ideas:
- reduce expression solving time and memory needs by better handling string data as pandas categorical data
- improve parallelization by taking advantage of updates to Python 3’s multiprocessing library
- continue to improve chunksize calculations for more optimized multiprocessing setups
- review ct-ramp and daysim performance ideas
Please add other ideas, thanks
A configuration file switch that can disable trip-level processing for tours based on tour mode. So, you can shut off (skip) stop_frequency, trip_purpose, trip_destination, trip_scheduling, and trip_mode_choice for walk and bike tours if you don't care about those trips (e.g. inside a global feedback loop iteration, I don't care about walk or bike trips as they don't impact congestion).
Bonus points: the ability to easily flop the switch the other way, and re-start only the filtered tours (e.g. I decided I finished doing all my global feedback loops and I want those non-motorized trips back now)
@stefancoe - add reading skim data from disk on-demand as opposed to reading every skim into RAM at the start as a way to trade runtime for RAM. @toliwaga implemented an undocumented version of this during the TVPB caching research and it runs slower but uses a lot less RAM. We may want to complete this feature for general use.
Some more ideas from discussions with SANDAG:
- Move from strings to factors
- Exponentiate ahead of time TAP to TAP utilities, along with pre-computing access/egress costs
- Smarter binary search / picking of an alternative from a large choice set (such as for location choice)
- Make trip destination (i.e. intermediate stop location choice) aware of the tour mode so: o For bike, walk, transit to reduce the set of possible mazs ahead of time o For auto, to pre-compute TAZ to TAZ total utilities to avoid duplication of calculations
- Smarter chunking calculations to get more throughput #406
- Continued expression review/tidying up to reduce redundancy of calculations (i.e. optimization of written expressions)
- Buy a bigger / faster server and test ahead of time in the cloud what’s possible with respect to runtime reductions
Some good ideas here to increase pandas performance. The Pandas eval function looks interesting. Could it replace/substitute python eval in some cases?
Could it replace/substitute python eval in some cases?
Not that we couldn't do it more, but we're already using pandas.eval in several places, for example:
- https://github.com/ActivitySim/activitysim/blob/bcdc7b63d4ff7bc2703810e226090c75c380bda4/activitysim/core/interaction_simulate.py#L146
- https://github.com/ActivitySim/activitysim/blob/bcdc7b63d4ff7bc2703810e226090c75c380bda4/activitysim/core/simulate.py#L443
Oh good to know-Thanks for pointing that out!