ColossalAI issues

[BUG]: /bin/bash: line 0: export: `=/usr/bin/supervisord': not a valid identifier Error: failed to run torchrun --nproc_per_node=1 --nnodes=1 --node_rank=0 --rdzv_backend=c10d --rdzv_endpoint=127.0.0.1:29500 --rdzv_id=colossalai-default-job train.py --use_trainer on 127.0.0.1, is localhost: True, exception: Encountered a bad command exit code!

1

### 🐛 Describe the bug root@autodl-container-8450119b52-890be3f8:~# colossalai run --nproc_per_node 1 train.py --use_trainer /bin/bash: line 0: export: `=/usr/bin/supervisord': not a valid identifier Error: failed to run torchrun --nproc_per_node=1 --nnodes=1 --node_rank=0 --rdzv_backend=c10d...

tang-ed

bug

[BUG]: Chat第三步的tokenizer只有一个，如果actor和critic是两个模型呢？

1

iMountTai

experience_batch_size in PPO training

ColossalAI/applications/Chat/coati/trainer/ppo.py: replay_buffer = NaiveReplayBuffer(train_batch_size, buffer_limit, buffer_cpu_offload) Because this is constructing experimental data,should the train_batch_size in the above code be experience_batch_size?

guijuzhejiang

[FEATURE]: would you like update the link for Webtext?

1

### Describe the feature In https://github.com/hpcaitech/GPT-Demo?tab=readme-ov-file How to Prepare Webtext Dataset You can download the preprocessed sample dataset for this demo via our [Google Drive sharing link](https://drive.google.com/file/d/1QKI6k-e2gJ7XgS8yIpgPPiMmwiBP_BPE/view?usp=sharing). we can see...

SeekPoint

enhancement

[shardformer] hybridparallelplugin support gradients accumulation.

## 📌 Checklist before creating the PR - [x] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A concise...

flybird11111

Fix logit processor and sampler

1

CjhHa1

Implement triton kernels for inference

Tracking for implementation of triton kernels compatible with relevant submodules and KVCache for inference. - Context-stage Attention https://github.com/hpcaitech/ColossalAI/pull/5192 - Decoding-stage Attention - Pos Embedding - https://github.com/hpcaitech/ColossalAI/pull/5181 - KVCache Copy

yuanheng-zhao

enhancement

ColossalAI
ColossalAI copied to clipboard

Metadata

[BUG]: Chat第三步的tokenizer只有一个，如果actor和critic是两个模型呢？

experience_batch_size in PPO training

[FEATURE]: would you like update the link for Webtext?

[shardformer] hybridparallelplugin support gradients accumulation.

Fix logit processor and sampler

Implement triton kernels for inference

Test offline continuous batching, consider benchmarking

[workflow] fixed build CI

Implement speculative decoding

← Metadata

Owner

Metadata

ColossalAI ColossalAI copied to clipboard

Metadata

← Metadata

Owner

Metadata

ColossalAI
ColossalAI copied to clipboard