ghostplant comments

Results 272 comments of


                                            ghostplant

where can I find the source code for specific mod file?

Hi, the mod files are generated by a compiler project `autort`, which is an integration of different compilation backends. Even mod files seem to be the same format, they may...

How to achieve 253 tok/sec with DeepSeek-R1-FP4 on 8xB200

The bsz=1 and MTP=0 would be far below, that's why 100% success MTP helps a lot. But I have no idea how this question is related to this topic, we...

About adapting Tutel in fairseq

May I ask the reason for removing `system.cache()`? By the way, the patch was intended for a very old Fairseq checkpoint. Since it is impractical to keep patches up-to-date with...

About adapting Tutel in fairseq

Removing the cache will lead to an inability to recall each balance loss generated during the forwarding process when calculating the loss. As a result, training may become increasingly imbalanced,...

About adapting Tutel in fairseq

Nop. in `data` type, non-shared parameters are in Zero2 style, so they are still unique and independent in gradients.

About adapting Tutel in fairseq

Hi, unless you want to change the training GPU environments, you don't really need to do the conversion. Assume your model is 20GB for shared parameter and 800GB for non-shared...

About adapting Tutel in fairseq

Hello, we found that the fairseq_moe instruction is too old **while official fairseq also stops maintaining** and **the dataset link doesn't work as well**, so we're going to remove this...

How to install tutel on Huawei Ascend

Hello, Tutel currently does not support Huawei Ascend because we do not have the hardware model and SDK for it. However, we would be willing to support it if it...

How to set dp MoE

Hello, dp is `parallel_type == 0` using all_gather for ZeRO-2. This type is usually slower especially when the expert parameters is larger than activation sizes.

[Question] How can I set the max tokens with tutelgroup/deepseek-671b

The `max_tokens` isn't handled yet through REST JSON API. Instead, it is currently a static global setting, which is specified by the argument `--max_seq_len`? (the version 20250715 has a fine-grain...