ghostplant

Results 272 comments of ghostplant

Hi, current `fast_encode(x, )` requires x to be contiguous, while your model case is not satisfied, so you can get correct result by calling `fast_encode(x.contiguous())`. If you directly use MoELayer,...

For now, score tensor applies to either one of x and y, which is specified by is_postpone. Do you want to always not using score tensor? If so, the gating...

For your purpose, I think you need to delete `*self.gates_` from [L125 and L129](https://github.com/microsoft/tutel/blob/main/tutel/impls/fast_dispatch.py#L125-L129), and rebuild from from.

It is usually due to environmental issue that Pytorch fails to find CUDA SDK. Can you print the log of installation command below: ``` python3 -m pip install --verbose --user...

Thanks. What about the standard output of this: ```sh python3 -c 'import torch; import tutel_custom_kernel' ```

Can you search where is the OS path of this file in your anaconda3 environment: ```sh find /home/ubuntu/anaconda3 | grep tutel_custom_kernel ``` Your anaconda3 doesn't automatically add it to the...

> I sorry that I did follow the installation procedures, I still couldn't find the file 'tutel_custom_kernel', in the dist-packages. I'm not sure which part went wrong. I use CUDA11.6...

Because those shared libraries fails to locate on the disk, so Pytorch C++ modules can't load at initialization.

> > Thanks! I reinstall CUDA and torch, update tutel to the latest version, and it works! Thanks for your patience, that really helps me a lot. > > Can...

Did you run with large numbers of GPUs? In our experiment, 2DH gradually becomes faster than Linear A2A when the number of distributed A100s are at least over 256. Otherwise,...