ghostplant comments

Results 272 comments of


                                            ghostplant

Compiling an HIP kernel source is slow

Any suggestions? A thousand-line source kernel spends hours to finish its compilation.

Compiling an HIP kernel source is slow

@b-sumner Usually, a TVM engine compiling a large model can generate a thousand-line source codes containing many sub kernels. It takes 3-4 hours to compile such files, while nvcc takes...

Compiling an HIP kernel source is slow

@b-sumner clang-9 from rocm is sequantially executed, while nvcc compiles source codes using multiple process.

Compiling an HIP kernel source is slow

@b-sumner I think the problem is caused by putting too many `__global__` kernels within one source file. It is possible to split them into different standalone source files, with only...

How to create a custom expert with tutel?

You can follow this example: https://github.com/microsoft/tutel/blob/main/tutel/examples/helloworld_demo.py, which can be executed with: `python3 -m tutel.examples.helloworld_demo --batch_size=16`

How to create a custom expert with tutel?

Is that a static parameter that can be set just in `__init__` function of CustomExpertDemo?

How to create a custom expert with tutel?

Still need a few API upgrades to meet your requirement.

How to create a custom expert with tutel?

You need to feed extra argument data you need here: https://github.com/microsoft/tutel/blob/main/tutel/impls/moe_layer.py#L238, where `self.experts` is the layer object created from your custom `CustomExpertDemo`. You also need to extend corresponding argument list...

How to create a custom expert with tutel?

> When I use the Customexpert, it stopped here: if ctx.sharded_count > 1: raise Exception("`sharded_count > 1` is not implemented within this expert, Model parallel is disabled.") > > ```...

How to create a custom expert with tutel?

So looks like `num_global_experts` is smaller than the number of GPUs, right?