dask-cuda icon indicating copy to clipboard operation
dask-cuda copied to clipboard

Skip pyNVML to support Tegra devices

Open pentschev opened this issue 4 years ago • 13 comments

Fixes #400

pentschev avatar Sep 25 '20 21:09 pentschev

cc @JasonAtNvidia

pentschev avatar Sep 25 '20 21:09 pentschev

Codecov Report

Merging #402 into branch-0.16 will decrease coverage by 2.65%. The diff coverage is 69.23%.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.16     #402      +/-   ##
===============================================
- Coverage        59.65%   56.99%   -2.66%     
===============================================
  Files               17       19       +2     
  Lines             1331     1451     +120     
===============================================
+ Hits               794      827      +33     
- Misses             537      624      +87     
Impacted Files Coverage Δ
dask_cuda/utils.py 85.31% <69.23%> (-2.00%) :arrow_down:
dask_cuda/explicit_comms/dataframe_merge.py 90.38% <0.00%> (-0.45%) :arrow_down:
dask_cuda/explicit_comms/__init__.py 100.00% <0.00%> (ø)
dask_cuda/benchmarks/local_cupy_map_overlap.py 0.00% <0.00%> (ø)
dask_cuda/explicit_comms/dataframe_shuffle.py 95.52% <0.00%> (ø)
dask_cuda/device_host_file.py 98.64% <0.00%> (+0.03%) :arrow_up:
dask_cuda/cli/dask_cuda_worker.py 96.77% <0.00%> (+0.05%) :arrow_up:
dask_cuda/initialize.py 92.59% <0.00%> (+0.28%) :arrow_up:
dask_cuda/cuda_worker.py 72.28% <0.00%> (+0.86%) :arrow_up:
dask_cuda/local_cuda_cluster.py 82.95% <0.00%> (+0.93%) :arrow_up:
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 71fad3a...0c9f29a. Read the comment docs.

codecov-commenter avatar Sep 25 '20 21:09 codecov-commenter

Would this make sense to do generally or have we found issues with this approach in some cases? Asking as we now know of 2 cases where NVML is not an option Tegra and WSL.

jakirkham avatar Sep 25 '20 21:09 jakirkham

Would this make sense to do generally or have we found issues with this approach in some cases? Asking as we now know of 2 cases where NVML is not an option Tegra and WSL.

Using Numba to do that implies creating a context, and that can be tricky in certain situations where we need the context creation to be delayed. We should evaluate context creation on a case-by-case basis as this has proven problematic in several instances in the past.

TBH, I don't know if things are 100% correct for Tegra, but it's very difficult to evaluate much further without being able to test. Long story short: I hope this PR will provide some support to Tegra but we can't make any guarantees without extensive testing, same applies for WSL or any other platforms we may want to support in the future.

pentschev avatar Sep 28 '20 09:09 pentschev

Using Numba to do that implies creating a context, and that can be tricky in certain situations where we need the context creation to be delayed. We should evaluate context creation on a case-by-case basis as this has proven problematic in several instances in the past.

Won't it already be tricky to initialize the context? AIUI we wanted this for multi-GPU devices.

TBH, I don't know if things are 100% correct for Tegra, but it's very difficult to evaluate much further without being able to test. Long story short: I hope this PR will provide some support to Tegra but we can't make any guarantees without extensive testing, same applies for WSL or any other platforms we may want to support in the future.

Fair enough I think we have to rely on folks being pretty engaged to keep things working. This is also why I'm wondering if we can standardize on a common code path that we are able to test in other cases at least.

jakirkham avatar Oct 07 '20 17:10 jakirkham

On a discussion with @JasonAtNvidia and @jakirkham , we decided to move this to 0.17. One of the limitations we found is getting GPU information, such as memory, without NVML. That requires Numba, and since we replace its memory manager with RMM's and RMM doesn't implement such functionality, we get errors:

============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.1.1, py-1.9.0, pluggy-0.13.1 -- /mnt/data/miniforge3/envs/dasktest/bin/python
cachedir: .pytest_cache
rootdir: /home/nvidia/Documents/dask-cuda
collecting ... collected 998 items / 1 error / 1 skipped / 996 selected
==================================== ERRORS ====================================
________________ ERROR collecting dask_cuda/tests/test_spill.py ________________
dask_cuda/tests/test_spill.py:18: in <module>
    if utils.get_device_total_memory() < 1e10:
/mnt/data/miniforge3/envs/dasktest/lib/python3.7/site-packages/dask_cuda-0.16.0a0+95.g04dcbb6.dirty-py3.7.egg/dask_cuda/utils.py:175: in get_device_total_memory
    return numba.cuda.current_context().get_memory_info()[1]
/mnt/data/miniforge3/envs/dasktest/lib/python3.7/site-packages/numba/cuda/cudadrv/driver.py:1040: in get_memory_info
    return self.memory_manager.get_memory_info()
/mnt/data/miniforge3/envs/dasktest/lib/python3.7/site-packages/rmm/rmm.py:214: in get_memory_info
    raise NotImplementedError()
E   NotImplementedError
=============================== warnings summary ===============================
../../../../mnt/data/miniforge3/envs/dasktest/lib/python3.7/site-packages/pandas/util/__init__.py:12
  /mnt/data/miniforge3/envs/dasktest/lib/python3.7/site-packages/pandas/util/__init__.py:12: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
    import pandas.util.testing
dask_cuda/tests/test_local_cuda_cluster.py:76
  /home/nvidia/Documents/dask-cuda/dask_cuda/tests/test_local_cuda_cluster.py:76: PytestUnknownMarkWarning: Unknown pytest.mark.asyncio - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/mark.html
    @pytest.mark.asyncio
dask_cuda/tests/test_local_cuda_cluster.py:89
  /home/nvidia/Documents/dask-cuda/dask_cuda/tests/test_local_cuda_cluster.py:89: PytestUnknownMarkWarning: Unknown pytest.mark.asyncio - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/mark.html
    @pytest.mark.asyncio
-- Docs: https://docs.pytest.org/en/stable/warnings.html
=========================== short test summary info ============================
ERROR dask_cuda/tests/test_spill.py - NotImplementedError
!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!
=================== 1 skipped, 3 warnings, 1 error in 4.28s ====================

pentschev avatar Oct 07 '20 19:10 pentschev

Won't it already be tricky to initialize the context? AIUI we wanted this for multi-GPU devices.

Even single-GPU devices can benefit from these changes for Tegra clusters.

Fair enough I think we have to rely on folks being pretty engaged to keep things working. This is also why I'm wondering if we can standardize on a common code path that we are able to test in other cases at least.

It seems we'll soon begin testing more with Tegra, so that will help us here. To be frank, I don't want to move away from NVML only because of Tegra, NVML is much better in various ways (e.g., no need to create a CUDA context) and supports things like gathering CPU affinity for each GPU that other tools don't, so I'd rather have two code paths.

pentschev avatar Oct 07 '20 20:10 pentschev

It seems we'll have some Tegra devices in CI soon, we can then push this PR forward.

pentschev avatar Nov 30 '20 21:11 pentschev

This PR has been marked stale due to no recent activity in the past 30d. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be marked rotten if there is no activity in the next 60d.

github-actions[bot] avatar Feb 16 '21 19:02 github-actions[bot]

I believe we are still waiting on Tegra devices

quasiben avatar Feb 17 '21 13:02 quasiben

@mike-wendt @raydouglass based on the discussion from Ops Demo Tuesday, this is a PR we would like to test with Tegra devices when we have it available in CI.

pentschev avatar Feb 23 '21 22:02 pentschev

@mike-wendt @raydouglass based on the discussion from Ops Demo Tuesday, this is a PR we would like to test with Tegra devices when we have it available in CI.

@pentschev @quasiben as an update @Ethyling and I are working on getting the L4T images integrated on Monday. Once we get that done we'll have a platform to easily test these builds for CUDA 10.2 as that's the only current CUDA version supported on L4T/AGX.

mike-wendt avatar Mar 19 '21 15:03 mike-wendt

Sounds great, thanks @mike-wendt and let us know if/how we can assist.

pentschev avatar Mar 19 '21 16:03 pentschev