ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

Making large AI models cheaper, faster and more accessible

Results 1072 ColossalAI issues
Sort by recently updated
recently updated
newest added

## 📌 Checklist before creating the PR - [x] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]: A...

### 🐛 Describe the bug I using this configuration as example ```python plugin = HybridParallelPlugin( tp_size=2, pp_size=2, zero_stage=1, microbatch_size=1, num_microbatches=None, enable_jit_fused=False, enable_fused_normalization=True, enable_flash_attention=True, precision=mixed_precision, initial_scale=1, ) ``` The parameters needed...

bug

### 🐛 Describe the bug I run my server with this: python3 ./ColossalAI/applications/Chat/inference/server.py /home/ubuntu/modelpath/llama-7b/llama-7b/ --quant 8bit --http_host 0.0.0.0 --http_port 8080 then I call the api with this: import requests import...

bug

### 🐛 Describe the bug ``` RuntimeError: The server socket has failed to listen on any local network address. The server socket has failed to bind to [::]:29601 (errno: 98...

bug

### 🐛 Describe the bug when i use booster api and gemini plugin to train the PIDM, this error happens: ```python File "train.py", line 167, in train booster.backward(loss, optimizer) File...

bug

### 🐛 Describe the bug I run `colossalai run --nproc_per_node 8 finetune.py \ --plugin "gemini_auto" \ --dataset "/home/pdl/xlz/ColossalAI/data" \ --model_path "/home/pdl/xlz/pretrain_weights/Colossal-LLaMA-2-7b-base" \ --task_name "qaAll_final.jsonl" \ --save_dir "./output" \ --flash_attention \...

bug

### Discussed in https://github.com/hpcaitech/ColossalAI/discussions/5027 Originally posted by **jiejie1993** November 8, 2023 多机多卡训练过程中,发生NCCL timeout超时,在torch中有--max-restarts对训练进行重启,但是如何去自动加载最新的已经保存的模型?使用--load-checkpoint需要多节点都有这个保存的模型,但训练中只会在master节点保存模型,手动复制到所有节点的话无法实现训练自动重启,有没有什么办法实现自动重启中断的训练,并从已经保存的最新模型恢复的功能?

### 📚 The doc issue I want to replace adam with sgd in [Colossal-LLaMA-2](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Colossal-LLaMA-2) because I don't have enough gpu but have time to adjust hyper-parameters. Is there any examples...

documentation

### Describe the feature I found both the two examples will truncate text longer than max_length. So we have to segment long text to short ones. For examples/language/llama2, the codes...

enhancement