marsggbo
marsggbo
### 🐛 Describe the bug ZeRO doesn’t support models with a dynamic forward I modified this [example](https://github.com/hpcaitech/ColossalAI-Examples/blob/main/features/zero/train.py) by using the following model, which has a dynamic forward function: ```python class...
### 🐛 Describe the bug https://github.com/hpcaitech/ColossalAI/blob/61f31c3cf01a8db5084401e5f93f52a8f6bcb185/colossalai/logging/logger.py#L127 `_FORMAT` is not defined ### Environment _No response_
``` >>> pip install torch==0.2.0_4 torchvision Collecting torch==0.2.0_4 Could not find a version that satisfies the requirement torch==0.2.0_4 (from versions: 0.1.2, 0.1.2.post1, 0.3.1, 0.4.0, 0.4.1) No matching distribution found for...
In the original code, I tried to use my own dataset which is larger than CIFAR10. My image shape is: - train: 1993*3*224*224 - valid: 993*3*224*224 - test : 993*3*224*224...
**Describe the bug** what's the possible reason for error below ``` File "/home/xihe/xinhe/distNAS/DeepspeedNAS/train.py", line 200, in train_zero engine.backward(loss) File "/home/xihe/xinhe/deepspeed/DeepSpeed/deepspeed/utils/nvtx.py", line 11, in wrapped_fn ret_val = func(*args, **kwargs) File "/home/xihe/xinhe/deepspeed/DeepSpeed/deepspeed/runtime/engine.py",...
### 🐛 Describe the bug # Setup I am currently running the Colossalai/examples/language/openmoe project with the following experimental setup: - datasets: `load_dataset("yizhongw/self_instruct", "super_natural_instructions")`, I also tried `"wikitext-2"` - model: `openmoe-base`...
it seems that the provided code is based on a single GPU. Any tutorials for finetuning mistral-moe with expert/data/pipeline parallel?
现在编辑器都是自动渲染公式和图片了,有什么办法取消吗,没找到设置的地方?
Like the version of python and torch, etc. Thanks