activitysim
activitysim copied to clipboard
ActivitySim freezing when creating threads with no error message
Describe the bug While upgrading ABM3 to work with v1.3.4 I've had some instances where ActivitySim freezes while creating the various threads in our mp_households group of steps (which is the bulk of our resident model). It shows ActivitySim as continuing to run, but nothing is happening. In order to stop this when running in EMME I need to open the command line and individually close every Python process that had been created. I have attached a logfile that shows that 40 processes should be created but it stops after the 31st with no explanation. I've been able to mostly get around this by lowering the number of processors but a test I ran today showed that it doesn't completely reduce the risk of the issue reoccurring.
To Reproduce I'm not sure of any conditions to reproduce the issue other than rerunning tests I've already run. If I can get more specific information I can update this with that.
Expected behavior The threads should be spun up and run with no issue and if there is an issue in creating the threads the model should be stopped and an error message should appear.
I've got a way for people with access to SANDAG's servers to see this.
- Remote into our server Cougar (please check in task manager to see if someone else is using it before continuing).
- Open Anaconda Prompt.
- Run the following commands:
activate asim_134 :: Environment with ActivitySim 1.3.4
cd C:\abm_runs\jflood\abm3_asim132_commandline
run_asim.bat
It should run for awhile and then freeze with the threads for the mp_households process are being spun up (probably after mp_households_30).
This might be related to #883, which is still unsolved.
Possibly. In that issue did the processes that were able to start complete? In my runs everything is just freezing. I did check the number of sockets on the servers where it's happened and one had 2 and the other had 1.
No, they did not complete. Everything was freezing like yours. If it also failed on a 1-socket machine, then socket might not be the cause. I still wonder if this could be machine specific, because I did not have this problem when running the same settings of issue 883 on WSP machine. Were you able to run ABM3 with 40 processors before? I don't have access to SANDAG machine.
In addition to this problem, may I ask if you've tried running ABM3 with fewer processors, and observe the difference in run time when you add more processors? Based on the tests David and I did in issue https://github.com/ActivitySim/sandag-abm3-example/issues/9, more processors doesn't always give better run time. You might be able to get a similar if not lower run time with fewer processors.
Ah, it does seem like it might be the same issue then.
I haven't noticed an increase in runtime when lowering the number of processors, and in fact may have seen a slight improvement.
I've run into this issue on 1.3.2 with the SANDAG model. Asim will make the pipeline groups for whatever num_processors I tell it, but it would max out at 11 (mp_households0 through mp_households10) processes, and get to a point where it stops, I'm guessing it's related to processes 12-n not starting or finishing.
Interestingly, I did NOT run into this issue at RSG with the MWCOG model, I was able to run 20 processors on it - that was likely with 1.3.1.