Jiarui Fang(方佳瑞)

Results 63 issues of Jiarui Fang(方佳瑞)

Hello, I try to run the example in `lightseq/examples/training/huggingface`. Because I use a game PC, so I sightly modify the `run_ner.sh` script (two lines as follows). ``` python3 -m torch.distributed.launch...

/opt/conda/lib/python3.7/site-packages/torch/onnx/utils.py:738: UserWarning: ONNX export failed on ATen operator einsum because torch.onnx.symbolic_opset9.einsum does not exist .format(op_name, opset_version, op_name)) multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/opt/conda/lib/python3.7/multiprocessing/pool.py", line 121, in worker...

Now the logic inside MultiheadAttention Layer is too complex for development. Moreover, some bugs exist in intermediate management. It is the first priority to rewrite these codes to make others...

enhancement
help wanted

Using FBGEMM to support CPU quantization.

documentation
enhancement
help wanted

### Describe the feature I found that ColoTensor lacks some basic functionalities. - [x] initialized in shard mode from a torch tensor. - [x] save and load in a distributed...

enhancement

### 🐛 Describe the bug I consume the zero dose not initialize model parameter correctly. 1. In ZeroInitContext, we adapt torch param to type ShardedParamV2 when an param is constructed....

bug

### Describe the feature I propose to implement a runtime memory tracer. It can be turned on/off by users. It traces the GPU memory footprint during the training process, or...

help wanted

I read the paper [Maximizing Parallelism in Distributed Training for Huge Neural Networks](https://arxiv.org/abs/2105.14450). The idea is elegant and does make sense to me. However, I just wonder about the compatibility...

enhancement

Hello developers. I found the performance of MP provided is not good. I compared it with [PatrickStar](https://github.com/Tencent/PatrickStar) and [DeepSpeed](https://github.com/microsoft/DeepSpeedExamples/tree/1fed12e8b375b0c54902827e7140d8266dfccd59/Megatron-LM-v1.1.5-ZeRO3). Can you check it with me? See MR #115 BTW: I...

enhancement

## Background Colossalai integrates a variety of parallel modes, and the tensor data structure of each parallel mode is different. Specifically. ZeRO:Wrap the data and grad of torch.nn.Parameter as a...

enhancement