Boxiang Wang comments

Results 30 comments of


                                            Boxiang Wang

ColossalAI cannot run the shufflenet_v2_x1_0 model as torch do

Hi, could you provide your training code for us to reproduce this bug? Besides, could you double-check your dataset settings?

ColossalAI cannot run the shufflenet_v2_x1_0 model as torch do

I have tried our code with a simple change of model from resnet to shufflenet. It takes about 32521MiB with`BATCH_SIZE = 16384`, and no OOM occurred.

ColossalAI cannot run the shufflenet_v2_x1_0 model as torch do

Hi @songyuc, you can uninstall your current `colossalai` and install our latest version with ```` git clone https://github.com/hpcaitech/ColossalAI.git cd ColossalAI # install dependency pip install -r requirements/requirements.txt # install colossalai...

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9)

Have you tried modifying [.wslconfig](https://learn.microsoft.com/en-us/windows/wsl/wsl-config) file for more memory and more processors? It works for me.

Update detr attention

Add assertion for always save nemo add model parallel size

Yes, this was an NVbug about NeMo 1.0. We are not going to save .nemo in 2.0 right now

Add assertion for always save nemo add model parallel size

@maanug-nv Can you help approve this again? It just passed all tests.

[BUG] Checkpoint state dict remapping is not applied for MLA layers

I think this change could not be generally applied to all kinds of model loading. Maybe it should be added per customers' need

[BUG] Dual meaning of `max_position_embeddings`, computing both embedding shape & yarn scaling base

Hi, thanks for your issue, we were aware of this bug and have already come up with a fix for 0.11 release. It will further be integrated with other pos_emb...

[BUG] Dual meaning of `max_position_embeddings`, computing both embedding shape & yarn scaling base

Hi @yzlnew, it should be fixed with https://github.com/NVIDIA/Megatron-LM/blob/00efe37a85194a521789778ae47299ce8c054dc0/megatron/core/transformer/multi_latent_attention.py#L363