hyperspace icon indicating copy to clipboard operation
hyperspace copied to clipboard

Capability to change each hyperspace to use < a node

Open jdakka opened this issue 6 years ago • 14 comments

Is there a way to change the HyperSpace models to use less than a node per process? This will be the case for XSEDE-Bridges if we decide to run simulations, as we won't have enough nodes to spawn 16 MPI processes in which each process takes a node. Also for Summit the number of cores/node will fluctuate anywhere from 42 cores to 128 cores. Ideally we would want to have each process use the same number of cores, but we should benchmark what is the sweet spot on the optimal number of cores per process.

jdakka avatar Nov 14 '18 22:11 jdakka

Sorry it has taken me so long to get back with you! I just saw this issue.

I may have confused us when talking about HyperSpace running one optimization per node. Technically, it is one optimization per MPI rank. So depending on the system, we could set a number of ranks per compute node. Over the last week or so I have been running 256 ranks on a single DGX. Normally how we place the MPI ranks would be handled by aprun, in the case of machines like Titan, or now jsrun, in the case of Summit. Would the allocation of resources per MPI rank be handled by the Radical scheduler?

yngtodd avatar Nov 21 '18 03:11 yngtodd

Hi Todd. Following Jumana's question, I see that the minimum is 1 MPI Rank per optimization then, correct? Also, am I correct in assuming each optimization requires only 1 core? (I am interfacing through RADICAL)

karahbit avatar Nov 07 '19 21:11 karahbit

Hey @karahbit, when asking about the minimum number of MPI ranks per optimization, do you mean the total number of ranks required by hyperspace for a given problem, or do you mean the the number of ranks assigned to a given Bayesian optimization loop? Each Bayesian optimization loop gets one MPI rank, but hyperspace runs many of those in parallel, and the total number of ranks is given by 2^{D} where D is the dimension of your search space.

In the simplest case, it is possible to run the algorithm over a single search dimension. Say this search space is the following:

x = [0, 1, 2, 3]

Hyperspace would divide that search space into two subintervals

x_0 = [0, 1, 2]
x_1 = [1, 2, 3]

Then it will run two parallel Bayesian optimization steps, one for each subinterval of the search space. Each Bayesian optimization step gets its own MPI rank.

You are right each Bayesian optimization step requires only one core. The optimization at each rank is handled by scikit-optimize, and it only needs a single core.

yngtodd avatar Nov 09 '19 17:11 yngtodd

Excellent, thank you Todd. I just wanted to confirm my assumption because, as Jumana did, I want to run multiple Bayesian optimizations on a single node without requesting more cores than I actually need to.

On Nov 9, 2019, at 12:47 PM, Todd Young [email protected] wrote:



Hey @karahbithttps://github.com/karahbit, when asking about the minimum number of MPI ranks per optimization, do you mean the total number of ranks required by hyperspace for a given problem, or do you mean the the number of ranks assigned to a given Bayesian optimization loop? Each Bayesian optimization loop gets one MPI rank, but hyperspace runs many of those in parallel, and the total number of ranks is given by 2^{D} where D is the dimension of your search space.

In the simplest case, it is possible to run the algorithm over a single search dimension. Say this search space is the following:

x = [0, 1, 2, 3]

Hyperspace would divide that search space into two subintervals

x_0 = [0, 1, 2] x_1 = [1, 2, 3]

Then it will run two parallel Bayesian optimization steps, one for each subinterval of the search space. Each Bayesian optimization step gets its own MPI rank.

You are right each Bayesian optimization step requires only one core. The optimization at each rank is handled by scikit-optimizehttps://github.com/scikit-optimize/scikit-optimize, and it only needs a single core.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/yngtodd/hyperspace/issues/23?email_source=notifications&email_token=AC7TYLQ5ZACPI7I4ZXEUTE3QS3ZSJA5CNFSM4GD247HKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDULL3A#issuecomment-552121836, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AC7TYLR3LX4UZK47M5JBABLQS3ZSJANCNFSM4GD247HA.

karahbit avatar Nov 09 '19 18:11 karahbit

HI @yngtodd, picking up from this previous conversation., I am doing hyperparameter optimization using HyperSpace! But I would like to know its behavior a little bit better.

So I have 4 parameters: this would be 16 MPI ranks, requiring 16 cores. I was effectively able to solve the problem, but I would like to show how big and how fast I can go. Specifically, I am trying to show strong scaling behavior. However, when I ran HyperSpace for the same problem but using 8 MPI ranks, it was still able to finish it, taking approximately half the time. Do you mind giving me a brief explanation of what's going on behind the scenes here?

Thank you!

karahbit avatar Jun 23 '20 22:06 karahbit

Hey @karahbit , thanks for using the library.

How many results are being saved when you have 4 parameters but run with 8 mpi ranks? I have a sneaking suspicion that you may only have 8 results. Hyperspace would then only being running the Bayesian optimization on half of the sub-spaces. I just tried that on one of the benchmarks, and that seems to be the case. If you are also seeing that behavior, then I should add a warning for this.

yngtodd avatar Jun 25 '20 02:06 yngtodd

Hi @yngtodd, of course, it has proven to be useful and I appreciate your work.

As you correctly say, when running 8 MPI processes for 4 hyperparameters, I see only 8 traces or optimizations. I'm missing the other half, therefore having an unfinished solution and providing a not so good score.

What about running the same required 16 MPI processes but on a lower amount of cores, let's say 8? I would be loading each core with 2 processes each, slowing down the solution but at least I get an accurate one. I believe in the MPI world this is called oversubscribing. Do you know anything about this and if this is possible with Hyperspace?

karahbit avatar Jun 25 '20 02:06 karahbit

Yeah, that would be possible. In the case that num_subspaces > num_ranks, we could place the remaining num_subspaces - num_ranks on ranks that already have a search space to work on. I don't think that would take much to make that happen.

yngtodd avatar Jun 25 '20 03:06 yngtodd

In the case that you were just testing, you could use dualdrive(). that would run two of subspaces on each rank. So if you want to use exactly half the number of ranks compared with subspaces, you would be good to go. But, if the number of subspaces is not half the number of MPI ranks, then you would be back to silently leaving out some of the spaces.

yngtodd avatar Jun 25 '20 03:06 yngtodd

I was actually taking a look into dualdrive. That seems to be a viable solution for the case of spawning 16 MPI processes (16 subspaces) on 8 cores/ranks. But what about if we have only 4 cores now, or 2? To give you some context, I ask these questions because I am concerned with the strong scaling behavior of the solution and please, correct me if I'm wrong with any terminology as I am learning all of this.

karahbit avatar Jun 25 '20 03:06 karahbit

Yeah, in that case the dualdrive would not be the way to go. We would want to go with that new approach I started to mention.

yngtodd avatar Jun 25 '20 03:06 yngtodd

Ah, I see. So just by playing around with the MPI launch command mpirun --hostfile hostfile -n 16 ... in a way that we let it place the 16 processes on 4 cores through a hostfile, for example, wouldn't do the trick. In other words, the oversubscribing functionality that MPI provides won't work for our purposes. We would need to modify the approach taken from Hyperspace itself. Is this correct or am I missing something?

karahbit avatar Jun 25 '20 03:06 karahbit

Yeah, it require some changes in Hyperspace. Originally, the subspaces were scattered out to the various ranks from rank 0. Now, each rank sees all of the subspaces, and indexes into them by rank here. This is fine if you know that you just want one subspace per rank and you are using the number of ranks equal to the number of subspaces. But when you want more than one subspace per rank, this would need to change.

yngtodd avatar Jun 25 '20 14:06 yngtodd

Thank you for your input on this matter, it was really helpful!

karahbit avatar Jun 25 '20 17:06 karahbit