activitysim icon indicating copy to clipboard operation
activitysim copied to clipboard

ActivitySim freezing when creating threads with no error message

Open JoeJimFlood opened this issue 8 months ago • 6 comments
trafficstars

Describe the bug While upgrading ABM3 to work with v1.3.4 I've had some instances where ActivitySim freezes while creating the various threads in our mp_households group of steps (which is the bulk of our resident model). It shows ActivitySim as continuing to run, but nothing is happening. In order to stop this when running in EMME I need to open the command line and individually close every Python process that had been created. I have attached a logfile that shows that 40 processes should be created but it stops after the 31st with no explanation. I've been able to mostly get around this by lowering the number of processors but a test I ran today showed that it doesn't completely reduce the risk of the issue reoccurring.

To Reproduce I'm not sure of any conditions to reproduce the issue other than rerunning tests I've already run. If I can get more specific information I can update this with that.

Expected behavior The threads should be spun up and run with no issue and if there is an issue in creating the threads the model should be stopped and an error message should appear.

JoeJimFlood avatar Mar 22 '25 00:03 JoeJimFlood

I've got a way for people with access to SANDAG's servers to see this.

  1. Remote into our server Cougar (please check in task manager to see if someone else is using it before continuing).
  2. Open Anaconda Prompt.
  3. Run the following commands:
activate asim_134 :: Environment with ActivitySim 1.3.4
cd C:\abm_runs\jflood\abm3_asim132_commandline
run_asim.bat

It should run for awhile and then freeze with the threads for the mp_households process are being spun up (probably after mp_households_30).

JoeJimFlood avatar Mar 28 '25 20:03 JoeJimFlood

This might be related to #883, which is still unsolved.

i-am-sijia avatar Apr 01 '25 18:04 i-am-sijia

Possibly. In that issue did the processes that were able to start complete? In my runs everything is just freezing. I did check the number of sockets on the servers where it's happened and one had 2 and the other had 1.

JoeJimFlood avatar Apr 01 '25 23:04 JoeJimFlood

No, they did not complete. Everything was freezing like yours. If it also failed on a 1-socket machine, then socket might not be the cause. I still wonder if this could be machine specific, because I did not have this problem when running the same settings of issue 883 on WSP machine. Were you able to run ABM3 with 40 processors before? I don't have access to SANDAG machine.

In addition to this problem, may I ask if you've tried running ABM3 with fewer processors, and observe the difference in run time when you add more processors? Based on the tests David and I did in issue https://github.com/ActivitySim/sandag-abm3-example/issues/9, more processors doesn't always give better run time. You might be able to get a similar if not lower run time with fewer processors.

i-am-sijia avatar Apr 03 '25 15:04 i-am-sijia

Ah, it does seem like it might be the same issue then.

I haven't noticed an increase in runtime when lowering the number of processors, and in fact may have seen a slight improvement.

JoeJimFlood avatar Apr 04 '25 23:04 JoeJimFlood

I've run into this issue on 1.3.2 with the SANDAG model. Asim will make the pipeline groups for whatever num_processors I tell it, but it would max out at 11 (mp_households0 through mp_households10) processes, and get to a point where it stops, I'm guessing it's related to processes 12-n not starting or finishing.

Interestingly, I did NOT run into this issue at RSG with the MWCOG model, I was able to run 20 processors on it - that was likely with 1.3.1.

AndrewTheTM avatar Apr 23 '25 18:04 AndrewTheTM