Hongxin Liu
Hongxin Liu
DON'T merge to main. Create a new feature branch on the org repo and merge to it.
Could you reinstall the latest version of colossalai?
Does it compare with apex's implementation? We've integrate some apex cuda kernels and some of them are also implemented in Liger-kernel.
Deepspeed zero-3是完全切分权重而TP并不完全切分(例如非Linear/Embedding层)。当Activation较小时这种情况有可能发生,请提供更详细的信息
As lora weight is initialied from random
> @Edenzzzz I am using [script](https://github.com/hpcaitech/ColossalAI/blob/v0.3.6/examples/language/llama2/finetune.py). I am using dataset [yizhongw/self_instruct](https://huggingface.co/datasets/yizhongw/self_instruct) > > Eval logs for model trained with Hybrid Parallel plugin and pp_size=4 and tp_size=4 > > ``` >...
`NPU-VISIBLE-DEVICES`是本地设置的环境变量吗?正确格式应该是`NPU_VISIBLE_DEVICES`?
我们提供了昇腾的Torch基础镜像:`docker pull hpcaitech/pytorch-npu:2.4.0` 在此基础上直接安装colossalai即可:安装最新稳定版`pip install colossalai` 或者安装main分支`pip install git+https://github.com/hpcaitech/ColossalAI.git`
flash_attn is not available on NPU devices. DON'T install flash_attn and make a dummy directory in your python packages path. E.g. ```bash mkdir .conda/envs/myenv/lib/python3.10/site-packages/flash_attn touch .conda/envs/myenv/lib/python3.10/site-packages/flash_attn/__init__.py ```
Hi, could you reinstall the latest colossalai?