Leo Fang
Leo Fang
For cpu-gpu mutex: In this recipe I showed that it's possible to do a per-recipe mutex: https://github.com/nsls-ii-forge/qulacs-feedstock/blob/b8afd0011e9412703800b1c69f8b85eff00cffd5/recipe/meta.yaml#L56-L63 It's similar to the `mpi` mutex, except that it's done on a per-recipe...
conda-forge CUDA packages are self-contained. You don't need to do `module load` or anything like that to set up a CUDA environment. As long as all compute nodes have a...
`CONDA_OVERRIDE_CUDA` allows you to pretend you have a different driver version (which is the basis of `__cuda` virtual package) from the one installed in the system. It may or may...
Or even worse, don't have a physical GPU or the driver installed. So, yes, in this case it makes sense to use the env var to build an environment.
Note to self: This is a known perf weakness that the upstream will address in https://github.com/NVIDIA/cub/pull/578.
> Note to self: This is a known perf weakness that the upstream will address in [NVIDIA/cub#578](https://github.com/NVIDIA/cub/pull/578). @jrhemstad @gevtushenko Is https://github.com/NVIDIA/cccl/pull/3969 enough to fix this perf issue, or we need...
@srinivasyadav18 told me offline that the new fixed-size CUB reduction should be all we need. Setting this to blocked so that I remember to circle back after updating the CCCL...
This is exactly how `__cuda` is implemented: https://github.com/conda/conda/blob/5ebfc3e7cf4511794fa352d183062e5147d808d8/conda/plugins/virtual_packages/cuda.py#L108 Calling `nvidia-smi` is slightly worse: - under the hood, it still loads `libcuda` and other DSOs - you need a subprocess which...
FWIW CUDA dropped ppc64le support entirely by CUDA 12.5. In my (personal) opinion the maintenance overhead of ppc packages on conda-forge is too high and we should drop it too...
Hi, could you please push code to the same branch so that the changes are contained in the same PR, instead of creating multiple PRs?