ColossalAI-Examples icon indicating copy to clipboard operation
ColossalAI-Examples copied to clipboard

Examples of training models with hybrid parallelism using ColossalAI

Results 37 ColossalAI-Examples issues
Sort by recently updated
recently updated
newest added
trafficstars

### 🐛 Describe the bug I met overflow using the official scripts for GPT2. Is that a normal case? ``` cd XXX/ColossalAI/examples/language/gpt export DATA=/data/scratch/gpt_data/small-gpt-dataset.json torchrun --standalone --nproc_per_node=1 train_gpt.py --config=gpt2_configs/gpt2_zero3.py --from_torch...

### 🐛 Describe the bug I am trying example `colo_vit` but got this error ``` Traceback (most recent call last): File "", line 1, in ImportError: cannot import name 'colo_state_dict'...

### 🐛 Describe the bug NVIDIA DeepLearningExamples removed LDDL from DLE tools on Aug 16, 2022. Therefore, the guide on https://github.com/hpcaitech/ColossalAI-Examples/tree/main/language/bert/preprocessing fails to work in the following aspects: 1. pip...

### 📚 The doc issue We provide a [runnable example](https://github.com/hpcaitech/ColossalAI-Examples/tree/main/features/gradient_handler) to demonstrate the use of gradient handler. In this example, we used DataParallelGradientHandler instead of PyTorch DistributedDataParallel for data parallel...

### 🐛 Describe the bug I found a runtime error while running the code: The client socket has failed to connect to any network address of (hcp-bb-03, 52873). The client...

### 🐛 Describe the bug I tried to run the command in this link https://github.com/hpcaitech/ColossalAI-Examples/tree/main/language/opt, but errors occured. ```pyhton Traceback (most recent call last): File "run_clm.py", line 44, in from...

### 🐛 Describe the bug ### Environment torch==1.12.0a0+8a1a93a num_gpu=4 pipeline=4 model = gpt2-small

### 🐛 Describe the bug When running OPT example, I got the following errors: ``` AttributeError: type object 'ChunkManager' has no attribute 'search_chunk_size' ``` This is caused by an outdated...

# Patching CVE-2007-4559 Hi, we are security researchers from the Advanced Research Center at [Trellix](https://www.trellix.com). We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a...