jglaser comments

Results 16 comments of


                                            jglaser

support batch size dimension in 2D sparse attention mask

I do not need this functionality anymore, so will only be able to provide limited guidance. Feel free to close if noone else needs this.

Abstract out GPUFactory from HPMC

I am working on related stuff, will publish as PR soon.

support running Dask-Gateway on HPC systems without the use of sudo

@jcrist I am sorry if I was sounding dismissive, I did not mean to discourage. This is a much needed project with a well thought out design. The only issue...

Approximated values are off

PR #21 fixes a few (last?) outstanding bugs in the calculation

Initialization sometimes fails on multi-GPU nodes due to race condition

Bill, could you specify which function requires the mutex? Yes, the number in front of the ":" is the global process rank. Eight processes per node are calling the `rsmi_init()`...

Use PID for unique mutex name in /dev/shm

ping. Is anyone seeing this? Do you need more context?

Use PID for unique mutex name in /dev/shm

Hi... RCCL uses rocm_smi under the hood https://github.com/ROCmSoftwarePlatform/rccl/blob/4643a17f83900dd84676fc61ebf03be0d9584d68/src/misc/rocm_smi_wrap.cc#L37-L43 pytorch uses RCCL for distributed training, and instantiates multiple processes per node when there are multiple GPUs in a node. This leads...