activitysim icon indicating copy to clipboard operation
activitysim copied to clipboard

Expression profiling (legacy mode)

Open jpn-- opened this issue 7 months ago • 1 comments
trafficstars

Addresses #906. This pull request introduces a new performance profiling feature for expression evaluation in the ActivitySim framework. The feature allows developers to track and log the runtime of individual expressions, providing insights into potential bottlenecks in complex models. Key changes include the integration of a performance timer, updates to various core functions to support profiling, and new configuration settings for controlling profiling behavior.

Performance Profiling Feature

  • Integration of Performance Timer:

    • Added the EvalTiming class to measure and log the execution time of expressions. The timer is initialized conditionally based on the new expression_profile setting. (activitysim/abm/models/summarize.py, activitysim/core/assign.py, activitysim/core/interaction_simulate.py, activitysim/core/simulate.py) [1] [2] [3] [4]
  • Expression Timing Wrappers:

    • Wrapped key expression evaluations with performance_timer.time_expression to measure the execution time of individual expressions. (activitysim/core/assign.py, activitysim/core/interaction_simulate.py, activitysim/core/simulate.py) [1] [2] [3]
  • Log Writing:

    • Added functionality to write performance logs to a file after the evaluation process is complete. (activitysim/abm/models/summarize.py, activitysim/core/assign.py, activitysim/core/interaction_simulate.py, activitysim/core/simulate.py) [1] [2] [3] [4]

Configuration Updates

  • New Profiling Settings:

    • Introduced expression_profile and expression_profile_cutoff settings to enable/disable profiling globally or for specific components, and to filter out expressions based on runtime thresholds. (activitysim/core/configuration/top.py)
  • Documentation for Profiling:

    • Added docstrings to explain the purpose and usage of the new profiling settings and performance log file. (activitysim/core/configuration/base.py, activitysim/core/configuration/top.py) [1] [2]

Core Function Enhancements

  • Trace Label Support:

    • Extended several core functions to accept a trace_label parameter for more granular logging and profiling. (activitysim/core/simulate.py) [1] [2]
  • Performance Reporting:

    • Added a summary report generation step to analyze and output the slowest expressions across all models. (activitysim/cli/run.py)

This enhancement is primarily aimed at developers and advanced users who need to optimize model performance. While it introduces some overhead during execution, it provides valuable insights for debugging and improving complex expressions.

jpn-- avatar Apr 20 '25 20:04 jpn--

Example outputs from the SANDAG test model:

Or: Linked here. Note: if you want to see the results, you'll need to click the first links, or download the HTML files, the content is rendered as embedded JavaScript and the OneDrive preview probably won't show anything.

jpn-- avatar Apr 21 '25 16:04 jpn--

In order to test this, do I need to add any other settings beyond: expression_profile: True

Thanks!

stefancoe avatar Jul 24 '25 03:07 stefancoe

Ignore my question above, I figured out why I was not seeing any outputs.

stefancoe avatar Jul 24 '25 15:07 stefancoe

@jpn--, I ran into the following error when testing this branch in mc logsum in school location: image

Our model has been tested to run on asim v1.3.4 and v1.4.0.

Any suggestions? Let me know if you need anything from me to diagnose this issue.

bwentl avatar Jul 24 '25 17:07 bwentl

I ran into the same issue that @dhensle describes above.

stefancoe avatar Jul 24 '25 18:07 stefancoe

@jpn-- , thanks for the suggestion at our meeting today. It turns out my previous error was indeed due to file path being too long. I moved my working folder to C:/model folder and it finished successfully.

A solution to the path issue, and could also make the expression profiling more user friendly is to move all the logs into one csv file. That way, we can filter, sort and review what expression in what model are taking the most time. Similar to the memory_profile.csv.

One potential issue with exporting the timing and expression to csv is that many expressions may contain commas, thus causing potentially messy formatting for csv. We could clean up all the commas in the expressions to other characters like _ or . something similar as a workaround. To avoid confusion as to which line and expression the timing is for, we could also export row number and source model in addition to the actual expression.

Here is an example output to illustrate what the csv file could look like:

time_µsec,component,row_number,expression
0,trip_matrices.preprocessor.write_trip_matrices_annotate_trips_preprocessor,1,trips.tour_id.map(tours.number_of_participants)
21463,trip_matrices.preprocessor.write_trip_matrices_annotate_trips_preprocessor,2,od_skims['DIST']
4999,trip_matrices.preprocessor.write_trip_matrices_annotate_trips_preprocessor,3,np.where((reindex(persons.income_ df.person_id)>=150000)_5_reindex(persons.income_segment_ df.person_id))
...

bwentl avatar Jul 24 '25 18:07 bwentl

@stefancoe @dhensle I have found and fixed the problem you observed, which turns out was a Windows thing. I added a GitHub actions test that runs the expression profiling on a windows server to ensure it's fixed.

jpn-- avatar Jul 25 '25 22:07 jpn--

@bwentl I like your idea, although it's not a very simple change to implement. Due to potential preprocessing of spec files to remove zero-result rows, we cannot infer the SPEC file row number from the expressions available at the point the profiling occurs. I think it's more reliable to just keep strings exactly as they are evaluated. I'll open an issue to consider this for the future. In the meantime I added a warning message, if a log file write fails it will suggest the user check on file path length.

jpn-- avatar Jul 25 '25 22:07 jpn--

@jpn-- On the call I reported that I was still seeing the issue that @dhensle posted above. I wondered if it had to do with using the same output directory as my previous tests so I deleted all the files in the output directory and re-ran. This solved the issue. So it was the old files, not encoded as UTF-8? that must have been my issue.

stefancoe avatar Jul 31 '25 19:07 stefancoe

@jpn-- @stefancoe I can confirm the same behavior -- it crashed with the same error if the old files existed, but it works with a clean output folder.

However, this test does uncover an issue that the log files from previous runs are still in the expr-performance output folder and new logs are added with an appended "-1" to the file name. This results in the output timing statements actually counting double if a run is already in the output folder!

I think the functionality should be updated to delete old timing files before writing new ones.

dhensle avatar Jul 31 '25 23:07 dhensle

I think the functionality should be updated to delete old timing files before writing new ones.

I imagine users potentially want to keep older expression timing results around to compare before-and-after when applying some treatment to change the results, so I am hesitant to just wipe them automatically. How about if I collect results into a timestamped subdirectory of the expression profiling? So instead of writing to ./log/expr-performance/component.whatever.log we would write to ./log/expr-performance/20250731-2359/component.whatever.log. The end of run summaries would then also go into ./log/expr-performance/20250731-2359 and not double count or overwrite anything.

jpn-- avatar Aug 01 '25 01:08 jpn--

I think the functionality should be updated to delete old timing files before writing new ones.

I imagine users potentially want to keep older expression timing results around to compare before-and-after when applying some treatment to change the results, so I am hesitant to just wipe them automatically. How about if I collect results into a timestamped subdirectory of the expression profiling? So instead of writing to ./log/expr-performance/component.whatever.log we would write to ./log/expr-performance/20250731-2359/component.whatever.log. The end of run summaries would then also go into ./log/expr-performance/20250731-2359 and not double count or overwrite anything.

Sounds good to me. Saves the user from having to generate multiple output directories when testing/comparing different expressions.

stefancoe avatar Aug 01 '25 08:08 stefancoe

How about if I collect results into a timestamped subdirectory of the expression profiling?

Sounds good to me too.

dhensle avatar Aug 01 '25 16:08 dhensle

Should now be working as described above, with a timestamped log file.

jpn-- avatar Aug 05 '25 01:08 jpn--

I can confirm the timestamped outputs as described above for me. I have no issues with the code changes either. I think the only outstanding issue is the additional runtime question above.

dhensle avatar Aug 06 '25 18:08 dhensle

I was finally able to get around to testing it on my end. It seemed to work just fine, though I did get the screenshotted error at the end. I'm not sure if that's related (and all of the outputs were written). I have attached one of the outputs in case anyone would like to review it to confirm that it looks correct. ErrorAtEnd

JoeJimFlood avatar Aug 07 '25 16:08 JoeJimFlood

@JoeJimFlood

  1. I will look into that error message, and see if I can fix it.
  2. Runtime on this was 7078 seconds. How long does it take to run an otherwise identical run on the same server, but without the expression timing enabled?

jpn-- avatar Aug 07 '25 16:08 jpn--

@jpn-- I can also do a runtime test. My tests so far have been with our small set of inputs, which runs in about 12 minutes, single process. I did do a single process run with a 10 percent sample using the full inputs. I can do this with expression profiling turned off for a comparison. We usually run 100 percent samples with MP, but I imagine that should be off for this test?

I tested the timestamped output folder and it works as expected.

stefancoe avatar Aug 07 '25 17:08 stefancoe

@jpn-- I just reran the test on the same server using the same environment but setting expression_profile: False

  1. That error didn't happen this time. If I recall correctly I've seen it before within the past month or so. There's something about errors like this one and the one seen in #959 that's a bit concerning to me when using the model in application. The crashes stop the entire modeling system despite the fact that all of the outputs from ActivitySim are created.
  2. 6283 seconds

I also shared the results with @bhargavasana, and he agreed with me that it would be more useful if the process name were included in the folder name and not just the timestamp for when it was created. It took me a few tries to find the file I shared in my last comment. image

JoeJimFlood avatar Aug 07 '25 19:08 JoeJimFlood

Oh, we should not have all those timestamped directories... that's a bug I will fix. There should only be a single timestamp for the entire multiprocessor run, stamped at the beginning of that run by the primary process. All the subprocesses should write into subdirectories of that one timestamped directory.

As for the error: it's clearly an error in the tool that merges the logs into a pair of html analysis outputs, the component and subcomponent reports. These are two output files in addition to all the other outputs you expect from an ActivitySim run, and I am 99% sure your first attempt did not produce these two final summary files. It didn't crash for the second run because it didn't run the expression profiling summary step where the error lives.

And: users should never have expression profiling enabled in "application" or "production" mode. This is a developer's tool, and your results show us that it costs about 13% more runtime overall. Only developers who will actually look at the performance results and try to tune the expressions to run faster should be using it.

jpn-- avatar Aug 07 '25 19:08 jpn--

Ah, I had misread the error message. About a month ago we had an error in application when the pipeline files were being written and I didn't look very closely and assumed it was the same thing.

JoeJimFlood avatar Aug 07 '25 22:08 JoeJimFlood

@jpn-- Did the naming of the files work as intended? image

JoeJimFlood avatar Aug 15 '25 15:08 JoeJimFlood

@JoeJimFlood Yes, your most recent message looks correct. @dhensle Since you previously "request changes", I am blocked from merging this until you now approve it.

jpn-- avatar Aug 16 '25 16:08 jpn--

@jpn-- I think the updated text from Joe's full-scale sandag testing need to be added still.

dhensle avatar Aug 18 '25 17:08 dhensle

@dhensle I made the update to the docs in this PR

jpn-- avatar Aug 25 '25 13:08 jpn--

approved, thanks @jpn--

dhensle avatar Aug 25 '25 15:08 dhensle