JackieWu

[email protected]

China 1/6 out of the gravity

Results 88 comments of


                                            JackieWu

Hello, could you please give me some advice on why the size of the TinyCLIP-ViT-39M-16-Text-19M.bin model I distilled is not 300mb but 900mb, thanks very much!!!

It also contains the master weight and the optimizer states. You can keep the value corresponding to the key `state_dict` only. ```python ckpt = torch.load(checkpoint_fname) new_ckpt = dict(state_dict=ckpt['state_dict']) torch.save(new_ckpt, saved_fname)...

Compatibility Issue with H100 GPU, CUDA 12.2, and PyTorch 2.1 - AttributeError: module 'rpe_index_cpp' has no attribute 'forward_gpu'

Hi @gudrb , thanks for your attention to our work! It seems that the GPU operator is not built. Could you please try to rebuild the RPE operators? The environment...

Compatibility Issue with H100 GPU, CUDA 12.2, and PyTorch 2.1 - AttributeError: module 'rpe_index_cpp' has no attribute 'forward_gpu'

> I solved this problem by installing a slightly different version of PyTorch. > > For CUDA 12.2, I used the following command: > > conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0...

Make new optimizer more extensible, easier to integrate downstream for FSDP

@muellerzr Thanks for your contribution! The PR looks good to me. Sorry that I am not at Microsoft and do not have the authorization to review and merge the pull...

Does MS-AMP support FP8 all-gather?

Hi @zigzagcai , thanks for your attention to our work! The FP8 tensor with a scaling factor is stored in a uint8 tensor and a FP32 scalar. Therefore, the FP8...

About the buckets in the IRPE

Hi @Zhong1015 , thank you for your continued support! : ) The definition you provided is correct. The concept of `bucket` comes from the hash algorithm. A relative position `(x1-x2,...

Installation might be incomplete

Hi @leedrake5 , thanks for your attention to our work! I could not reproduce the issue. It seems that the packages `msamp_arithmetic` and `msamp_adamw` are not copied into the `site-packages`...

Installation might be incomplete

@leedrake5 The custom NCCL library in MS-AMP is used to support all-reduce operations for FP8 weight gradients. If the custom NCCL is not installed, the FP8 all-reduce in Megatron Optimizer...

‹
1
2
3
4
5
6
7
8
9