Initialization hangs when only setting --hpx:cores
Expected Behavior
If --hpx:threads hasn't been set and --hpx:cores has been set to some (smaller than default) value, silently set the number of OS threads to be equal to the number of specified cores, or throw an error if the user doesn't also explicitly specify a sufficiently small number for --hpx:threads (such that OS threads <= cores).
If both have been set by the user such that --hpx:threads > --hpx:cores, either throw an error or silently reduce the number of OS threads so that OS threads == cores.
(I assume that it's a bad idea to spawn more OS threads than cores we are allowed to run on.)
Actual Behavior
Initialization hangs when only setting --hpx:cores to a value smaller that the default number of OS threads, or when setting both --hpx:threads and --hpx:cores such that --hpx:threads > --hpx:cores.
@hkaiser How do you think we should handle this? I stumbled on this because the options_as_config regression test hangs on Rostam. I don't think I have any peculiar setup, so shouldn't it also fail on the CIs in the same manner?
Hi! I noticed that the options_as_config test hangs when --hpx:cores is set to a value smaller than the default number of OS threads, or when --hpx:threads is set greater than --hpx:cores.
Did you consider validating this combination during initialization? Would it make sense to either clamp --hpx:threads to match --hpx:cores, or throw an error if the user sets an invalid combination?
@Pansysk75 When you get a chance, could you please take a look and let me know your thoughts?
Hi @Prachethan7, I think we should do something if --hpx:cores < --hpx:threads. If my mental model is correct, it isn't desirable to have more worker threads than resources. Emitting a warning would probably be the safest bet, but maybe an error would be fine too.
@Pansysk75 Agreed — having more threads than cores isn’t ideal and can lead to oversubscription or hangs. Emitting a warning sounds like a safe and helpful first step. We could even consider making it an error under a strict mode if needed.