tutel
tutel copied to clipboard
[Question] Why use datatype ncclInt8 in nccl_all_to_all_scatter_async.
Wondering why the ncclint8 datatype is used in the C++ implementation of nccl_all_to_all_scatter_async, whether it's for speed reasons or simply because don't want to support multiple datatypes through templates.
Thanks!
According to bandwidth profiling, there is no speed difference between ncclInt8 x N
and ncclInt32 x N / 4
, so you can choose either.