HUANG Fei comments

Results 52 comments of


                                            HUANG Fei

the same arch's num parameters in lightseq is different from fairseq

@yzy5630 To my understanding, Lightseq does not support learnable positional embeddings in the current version. It may cause the differences.

nccl problem when using lightseq for fairseq multi-gpus training

In the latest version of fairseq (I'm using https://github.com/pytorch/fairseq/tree/420136acd2a57de22e62f13930aa23e086bcbbf8), ``args.device_id`` is not correctly set, so all lightseq module will allocate the memory on device 0. Notice the ``local_rank`` below: https://github.com/bytedance/lightseq/blob/812d9d798e491ab9139c1f36113693308c4c0637/lightseq/training/cli/fs_modules/ls_transformer.py#L148-L160...

Already installed the nvidia-ml-py3, but still no module named 'gpu_mem_track' on Ubuntu

This project is not a pip module. You need to copy the file, i.e. ``gpu_mem_track.py``, to your working directory

The Total Used Memory stays unchanged among .py files

Total Used Memory is the peak of the memory usage. When you delete some tensors, PyTorch will not release the space to the device, until you call torch.cuda.empty_cache() like the...

memory info doesn't match

Hello. I just answer the question in my PR. It is because the cuda kernel take some space. If you are interested, you can see the revised code here: https://github.com/hzhwcmhf/Pytorch-Memory-Utils/blob/master/README.md#faqs...

Is the value different from the nvidia-smi?

@SCAUapc We use pytorch API to obtain the memory use. You can see the explanation of ``torch.cuda.memory_allocated`` [here](https://pytorch.org/docs/stable/generated/torch.cuda.memory_allocated.html?highlight=memory_allocated#torch.cuda.memory_allocated) > This is likely less than the amount shown in nvidia-smi since...

HUANG Fei