Jiarui Fang（方佳瑞） issues

Results 63 issues of


                                            Jiarui Fang（方佳瑞）

[training] fail to run the huggingface example when batch size as 1.

Hello, I try to run the example in `lightseq/examples/training/huggingface`. Because I use a game PC, so I sightly modify the `run_ner.sh` script (two lines as follows). ``` python3 -m torch.distributed.launch...

ONNXRT can not be applied in Albert

/opt/conda/lib/python3.7/site-packages/torch/onnx/utils.py:738: UserWarning: ONNX export failed on ATen operator einsum because torch.onnx.symbolic_opset9.einsum does not exist .format(op_name, opset_version, op_name)) multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/opt/conda/lib/python3.7/multiprocessing/pool.py", line 121, in worker...

Refactor MultiheadAttention and other Layers

Now the logic inside MultiheadAttention Layer is too complex for development. Moreover, some bugs exist in intermediate management. It is the first priority to rewrite these codes to make others...

enhancement

help wanted

Developing CPU INT8 quantization

Using FBGEMM to support CPU quantization.

documentation

enhancement

help wanted

[FEATURE]: Basic Function of ColoTensors

### Describe the feature I found that ColoTensor lacks some basic functionalities. - [x] initialized in shard mode from a torch tensor. - [x] save and load in a distributed...

enhancement

ZeRO dose not initialize weight correctly

### 🐛 Describe the bug I consume the zero dose not initialize model parameter correctly. 1. In ZeroInitContext, we adapt torch param to type ShardedParamV2 when an param is constructed....

bug

[RFC] A tracer to monitor the memory usage during training

### Describe the feature I propose to implement a runtime memory tracer. It can be turned on/off by users. It traces the GPU memory footprint during the training process, or...

help wanted

[Discussion] About 3D Parallelism

I read the paper [Maximizing Parallelism in Distributed Training for Huge Neural Networks](https://arxiv.org/abs/2105.14450). The idea is elegant and does make sense to me. However, I just wonder about the compatibility...

enhancement

The performance of model parallelism (MP) is not good

Hello developers. I found the performance of MP provided is not good. I compared it with [PatrickStar](https://github.com/Tencent/PatrickStar) and [DeepSpeed](https://github.com/microsoft/DeepSpeedExamples/tree/1fed12e8b375b0c54902827e7140d8266dfccd59/Megatron-LM-v1.1.5-ZeRO3). Can you check it with me? See MR #115 BTW: I...

enhancement

[RFC] Unified Tensor Structure

## Background Colossalai integrates a variety of parallel modes, and the tensor data structure of each parallel mode is different. Specifically. ZeRO：Wrap the data and grad of torch.nn.Parameter as a...

enhancement