ColossalAI
ColossalAI copied to clipboard
Making large AI models cheaper, faster and more accessible
## Describe the problem In version 0.1.7, I found that Op hook leads to memory leak. If you use the hook on nn.module, even though it's a dummy hook, more...
### 🐛 Describe the bug When testing [DeTr on Colossal-Example](https://github.com/hpcaitech/ColossalAI-Examples/tree/main/image/detr), I encountered an issue that model with only DDP in situations: 1. `LEARNING_RATE=1e-4`, `world_size=4` 2. `LEARNING_RATE=2e-4`, `world_size=8` 3. `LEARNING_RATE=1e-4`, `world_size=8`...
### 🐛 Describe the bug I run the following script and get `RuntimeError: Function CudnnRnnBackward0 returned an invalid gradient at index 1 - got [0] but expected shape compatible with...
### 🐛 Describe the bug I run the following script and it reports `Found dtype Float but expected Half`. It turns out that `y_hat` is of type fp16, but `y`...
### 🐛 Describe the bug ZeRO will keep throwing overflow if used together with momentum SGD in the [resnet example](https://github.com/hpcaitech/ColossalAI-Examples/tree/main/image/resnet). The code works fine with all kinds of amp. ###...
### 📚 The doc issue Hi Colossal-AI developers, Thank you for your amazing work! Would you consider creating a Colab tutorial page? I think it can allow users to experiment...
### Describe the feature styleganXL是支持stylegan3,stylegan2ada的通用训练架构,代码也做了简化,用这个做案例会很棒,他家代码刚刚发布几周
This line of log confuses me, my batch size is 513 and iteration time is 98.83, so the throughput should be 5.19. Obviously, the logs of iteration time and throughput...
### Describe the feature We set spec on parameter now, which means each paramter has its own unchanged compute_pattern. However, some models, like GPT-2, share parameter among different layers. GPT-2...