ghostplant comments

Results 272 comments of


                                            ghostplant

RuntimeError: No such operator tutel_ops::cumsum

> export FAST_CUMSUM=0 Have you tried: `export FAST_CUMSUM=0`

RuntimeError: No such operator tutel_ops::cumsum

I don't suggest you install cuda toolkit over default Ubuntu repository, as they are too old. You should follow the instruction here: https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu After CUDA SDK is successfully, please purge...

tutel/jit_kernels/sparse.py torch.float16 There is a bug in the calculation: the cuda calculation result is inconsistent with the CPU calculation result and the array is out of bounds

Hi, thanks for your info. According to tracing, this is not a bug, but your code doesn't use it in a correct way: CUDA's evaluation from your code is based...

Multi-nodes training is much more slower than single node

Hi, thanks for reporting this issue. For low-equipped distributed environment (e.g. eithernet with low-end busbw), cross-node All2All is supposed to have a significant bandwidth utilization drop against single-node training as...

[installation errors] fatal error: nccl.h: No such file or directory

This is an improper environment configuration not recognized by pytorch. Can you make a copy of nccl.h to /usr/include, and a copy of libnccl.so to /usr/lib/x86_64-linux-gnu? (Softlink is also fine.)

how to use tutel on Megatron Deepspeed

Do you mean Megatron and Deepspeed respectively, or working together for them all?

how to use tutel on Megatron Deepspeed

Yes, Tutel is just an MoE layer implementation which is pluggable for any distributed frameworks. The way for other framework to use Tutel MoE layer is by passing distributed processing...

Non-surface function utilities only work for contiguous input data

Can you explain why "x == y" for `y = fast_encode(x.to(logits_dtype), crit, self.is_postscore).to(x.dtype)`?

Non-surface function utilities only work for contiguous input data

Can you set `gate_noise = 0` for both and check if they produce the same results?

Non-surface function utilities only work for contiguous input data

OK, can you help to provides these things? For both solutions, please add the following codes after `y = fast_encode(..)`: In example code: ```py ... torch.save([x, crit, y], 'test_cast_example.py') ```...