ghostplant comments

Results 272 comments of


                                            ghostplant

HCC_RUNTIME=CPU not working for rocm/tensorflow

Hi @whchung , `HIP_VISIBLE_DEVICES=-1` is not my purpose. I want to test how hip data path uses `libmcwamp_cpu.so` when AMD GPU (or `libmcwamp_hsa.so`) is not available, because I think `hcc`...

HCC_RUNTIME=CPU not working for rocm/tensorflow

@whchung So do you mean `libmcwamp_cpu.so` is actually not useful?

HCC_RUNTIME=CPU not working for rocm/tensorflow

@whchung Is there a user example to show what CPU-mode (libmcwamp_cpu.so) is used for? Thanks!

Can this package support the one-gpu machine

One GPU per machine? Can you explain how many machines you'd like to run it? Or you just want to run it using 1 GPU on 1 machine?

Can this package support the one-gpu machine

If you run it with a one-gpu machine, seems like you need to ensure this GPU memory size is enough to store all 32-expert parameters. The way to convert `swin_moe_small_patch4_window12_192_32expert_32gpu_22k.pth`...

[Question] Why use datatype ncclInt8 in nccl_all_to_all_scatter_async.

According to bandwidth profiling, there is no speed difference between `ncclInt8 x N` and `ncclInt32 x N / 4`, so you can choose either.

How the experts' gradients are handled under data parallelism?

You can reference the implementation here: https://github.com/microsoft/tutel/blob/main/tutel/experts/ffn.py

about compute_location and locations

It stores a list of unique index destinations that input tokens are to be written on for the following dispatching.

RuntimeError: No such operator tutel_ops::cumsum

What about `export FAST_CUMSUM=0` first?

RuntimeError: No such operator tutel_ops::cumsum

Gotcha, this problem is not from `tutel::cumsum`. Instead, you may perform an improper installation of Tutel that only enables CPU support rather than CUDA. The root cause could be an...