hpx icon indicating copy to clipboard operation
hpx copied to clipboard

Initialization hangs when only setting --hpx:cores

Open Pansysk75 opened this issue 2 years ago • 3 comments

Expected Behavior

If --hpx:threads hasn't been set and --hpx:cores has been set to some (smaller than default) value, silently set the number of OS threads to be equal to the number of specified cores, or throw an error if the user doesn't also explicitly specify a sufficiently small number for --hpx:threads (such that OS threads <= cores).

If both have been set by the user such that --hpx:threads > --hpx:cores, either throw an error or silently reduce the number of OS threads so that OS threads == cores.

(I assume that it's a bad idea to spawn more OS threads than cores we are allowed to run on.)

Actual Behavior

Initialization hangs when only setting --hpx:cores to a value smaller that the default number of OS threads, or when setting both --hpx:threads and --hpx:cores such that --hpx:threads > --hpx:cores.

@hkaiser How do you think we should handle this? I stumbled on this because the options_as_config regression test hangs on Rostam. I don't think I have any peculiar setup, so shouldn't it also fail on the CIs in the same manner?

Pansysk75 avatar Sep 07 '23 13:09 Pansysk75

Hi! I noticed that the options_as_config test hangs when --hpx:cores is set to a value smaller than the default number of OS threads, or when --hpx:threads is set greater than --hpx:cores.

Did you consider validating this combination during initialization? Would it make sense to either clamp --hpx:threads to match --hpx:cores, or throw an error if the user sets an invalid combination?

@Pansysk75 When you get a chance, could you please take a look and let me know your thoughts?

Prachethan7 avatar Apr 18 '25 04:04 Prachethan7

Hi @Prachethan7, I think we should do something if --hpx:cores < --hpx:threads. If my mental model is correct, it isn't desirable to have more worker threads than resources. Emitting a warning would probably be the safest bet, but maybe an error would be fine too.

Pansysk75 avatar Apr 20 '25 16:04 Pansysk75

@Pansysk75 Agreed — having more threads than cores isn’t ideal and can lead to oversubscription or hangs. Emitting a warning sounds like a safe and helpful first step. We could even consider making it an error under a strict mode if needed.

Prachethan7 avatar Apr 20 '25 20:04 Prachethan7