dask-cuda icon indicating copy to clipboard operation
dask-cuda copied to clipboard

[FEA] Consider using different default values for cluster configurations

Open randerzander opened this issue 4 years ago • 7 comments

In our gpu-bdb benchmarking, we found we needed to configure quite a few bash environment variables when setting up a dask-cuda cluster for optimal performance.

For example, if you're not using UCX, it's likely that GPU worker to GPU worker communication over TCP is at least higher latency, if not slower. To avoid TCP connection failures we needed to reconfigure Dask's "COMM" settings.

export DASK_DISTRIBUTED__COMM__TIMEOUTS__CONNECT="100s"
export DASK_DISTRIBUTED__COMM__TIMEOUTS__TCP="600s"
export DASK_DISTRIBUTED__COMM__RETRY__DELAY__MIN="1s"
export DASK_DISTRIBUTED__COMM__RETRY__DELAY__MAX="60s"

I suspect a great many dask-cuda users will be using TCP and not UCX, so would benefit from dask-cuda automatically re-configuring Dask's TCP defaults.

We also need to set --memory-limit, --device-memory-limit, and --rmm-pool-size values:

MAX_SYSTEM_MEMORY=$(free -m | awk '/^Mem:/{print $2}')M
DEVICE_MEMORY_LIMIT="25GB"
POOL_SIZE="30GB"

dask-cuda-worker --device-memory-limit $DEVICE_MEMORY_LIMIT --rmm-pool-size=$POOL_SIZE --memory-limit=$MAX_SYSTEM_MEMORY ...

It would be nice if dask-cuda included logic for detecting GPU memory per card and setting default values for device memory limit and RMM pool size. As a starting point, for a 32GB card, 25GB memory limit with 30GB pool size work well for the 30 tpcx-bb queries. I believe that's a decent representation of typical workflows where those values could be used proportionally for other card memory sizes as well.

randerzander avatar Jul 24 '20 15:07 randerzander

I think having some of these defaults makes sense . In dask core --memory-limit is set to auto by default .

In [1]: import pynvml

In [2]: pynvml.nvmlInit()

In [3]: handle = pynvml.nvmlDeviceGetHandleByIndex(0)

In [4]: info = pynvml.nvmlDeviceGetMemoryInfo(handle)

In [5]: info.total
Out[6]: 34089730048

maybe rmm-pool-size = Total*90%. @pentschev @jakirkham do you have thoughts here ?

quasiben avatar Jul 24 '20 17:07 quasiben

@mt-jones may remember why we had to set --memory-limit specifically. There might have been a bug where it wasn't being set properly, or the heuristic Dask uses is inappropriate for, say a DGX1.

randerzander avatar Jul 24 '20 18:07 randerzander

While I understand how changing such defaults make sense for TPCx-BB, I'm not so sure this makes sense for everyone, I don't think it will be difficult to find people trying to run dask-cuda together with other applications that also require memory (either device or host), and that could be a bit annoying for users. Think for example of someone running this on a workstation for which one of the GPUs is also used as display. That's not to say I'm totally against the idea, but I'm just a bit concerned about the broader usability of dask-cuda.

The --memory-limit issue was really a bug that got fixed in https://github.com/rapidsai/dask-cuda/pull/269 .

As for the communication variables, I think they were not for the workers (or not only, perhaps), but for the scheduler, am I mistaken? Changing the defaults for the scheduler would have to go to distributed upstream. As for workers, if we change those values we should probably check what would be really appropriate numbers, I have a feeling those were just defined arbitrarily high to make things work, was that not the case? And when we scale, would we have to scale the numbers too?

pentschev avatar Jul 24 '20 18:07 pentschev

I was just thinking after writing the comment above, maybe we should have some sort of "default recipes" for different use cases? E.g., the TPCx-BB case could use a "performance recipe", while today's defaults could be something like a "conservative recipe" and so on. I think this would alleviate complexity for multiple uses while still providing defaults that should work mostly anywhere and have an easy switch.

pentschev avatar Jul 24 '20 18:07 pentschev

I forgot to mention another related (potentially duplicate) issue: https://github.com/rapidsai/dask-cuda/issues/334 .

pentschev avatar Jul 24 '20 20:07 pentschev

This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.

github-actions[bot] avatar Feb 16 '21 19:02 github-actions[bot]

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

github-actions[bot] avatar May 17 '21 19:05 github-actions[bot]