sst-core
sst-core copied to clipboard
Python option sst.setThreadCount doesn't work
New Issue for sst-core
1 - Detailed description of problem or enhancement Trying to set the thread count from Python doesn't work, using either of these calls:
nthreads = 2
sst.setThreadCount(nthreads)
sst.setProgramOption('num_threads', nthreads)
2 - Describe how to reproduce the issue
a. Add one of those calls to any Phython script b. Run the script with
$ sst --num_threads=4 ...
c. The script will run with the number of threads from the command line (4), not what was set in the Python (2)
3 - What's happening
In src/sst/core/main.cc
:
- Line 512: the command line is parsed by the Config object
cfg
- Line 521:
world_size.thread
is set with the value from cfg, which at this point only has command line input - Line 575: finally execute the Python, but the change to thread count is never propagated to
world_size
[Edit: added the important word "never"]
This is the expected behavior. Command line options should take precedence over options in python file. If setting the number of threads in the python script doesn’t work when the number of threads isn’t specified on the command line, that is a bug that needs to be looked at.
But even without a command line option setting threads from Python doesn't work.
pymodel.cc:setSSTThreadCount:394
:
static PyObject*
setSSTThreadCount(PyObject* UNUSED(self), PyObject* args)
{
Config* cfg = gModel->getConfig();
long oldNThr = cfg->num_threads();
long nThr = SST_ConvertToCppLong(args);
if ( nThr > 0 && nThr <= oldNThr ) gModel->setConfigEntryFromModel("num_threads", std::to_string(nThr));
return SST_ConvertToPythonLong(oldNThr);
}
This makes it appear that you can decrease the number of threads from Python, but not increase. But the default is 1 thread (config.cc:587), so you can't ever get more than one thread without a command line arg.
Even if that check where changed it still wouldn't work, as I explained above: the config value is consulted before the Python is executed, and never adjusted after that to see if Python has changed it.
JSON doesn't have the reduce-only problem since it has no special API/tag for setting number of threads, just program_options
, but it still has the problem that world_size.thread
is never updated.
Thanks. This should be relatively easy to fix. I’ll take a look at it and get a PR ready.
The more I think about this, the more I feel that setting the number of threads is not something that should be available in the input file and should be limited to the command line. That being said, I will leave it in for now and have fixed the issue with setting thread count in the python file and added a check to make sure that all ranks have the same number of threads set in the parallel load case (no way for them to be different for serial loads). I am also going to deprecate the setThreadCount() function as redundant to using setProgramOption(). I will close this once the new code gets merged.
IMO it should be possible to set the number of threads from the input file.
The input file describes the model. If the model performs best with a certain number of threads (which might depend on other model parameters, such as size), the only place to make that decision is in the input Python. Further, if a model is self-partitioned the input Python needs to assign Components to threads, using setRank(mpi_rank, thread)
. Finally, setting the thread count in Python aids reproducibility: what actually gets run isn't dependent on externals.
Understood. That’s why I went ahead and left it in and just added the check to make sure every rank has the same number of threads (we don’t test the case of different number of threads on the ranks, so I’m not sure it will work). I can also leave setThreadCount() in and not deprecate it if that’s your preferred method. I usually try to avoid redundancy, just for code maintenance reasons, but I looked at the code and while the python functionality is redundant, all the calls go back to the same underlying function eventually, so there isn’t a big code maintenance issue.
I'm ok with deprecating/removing setRank(mpi_rank, thread)
since setProgramOption()
provides the equivalent. It sounded like you also want to remove that ("setting the number of threads is not something that should be available in the input file "), which would be bad, IMO, as explained above.