一枚嘉应子 comments

Results 8 comments of


                                            一枚嘉应子

[BUG]: 单机单卡和单机八卡训练时间一样

你好，我也遇到相同的问题。基于https://github.com/hpcaitech/ColossalAI-Examples/tree/main/features/tensor_parallel 项目进行测试，例如，单卡消耗1707m显存，倘若四卡张量并行，每张卡应消耗430m左右，但实际测试是每张卡消耗1723m。仅修改关键参数（nproc per node），其余不变。请问是我理解或操作有误嘛？请指正，谢谢

[BUG]: 单机单卡和单机八卡训练时间一样

> > 你好，我也遇到相同的问题。基于https://github.com/hpcaitech/ColossalAI-Examples/tree/main/features/tensor_parallel 项目进行测试，例如，单卡消耗1707m显存，倘若四卡张量并行，每张卡应消耗430m左右，但实际测试是每张卡消耗1723m。仅修改关键参数（nproc per node），其余不变。请问是我理解或操作有误嘛？请指正，谢谢 > > 你用官方的代码也是这样吗？这个问题困扰我们很久了，我们在单机单卡和单机八卡这两种设置下，显存消耗和时间消耗都是一样的，像是一种假并行。假如你能解决这个问题，请务必回复我，非常感谢！！暂未解决该问题，本人实验并未记录时间消耗。在验证官方代码时，张量的切分是正常的，但显存占用对不上。例如，四卡情况下，在1d切分时，输入256*1024，张量切分为256*256 在2d切分时，输入256*1024，张量切分为128*512

[BUG]: failed to run ..ColossalAI/examples/language/gpt/gemini

运行chatGPT项目时，一直停留在 No pre-built kernel is found, build and load the cpu_adam kernel during runtime now 显存显示已经载入模型，但没有任何进展。请问是否与内网无法连接网络有关？

[BUG]: failed to run ..ColossalAI/examples/language/gpt/gemini

Colossal-AI version: 0.2.5 PyTorch version: **1.13.1** CUDA version: **10.1** CUDA version required by PyTorch: **11.7** Note: The table above checks the versions of the libraries/tools in the current environment If...

[BUG]: failed to run ..ColossalAI/examples/language/gpt/gemini

cuda 10.1 torch 1.10.0 torchvision 0.11.1 It works

fig_v2报错

先归一化再转元组格式，跑一个小点云5GB内存都没加载出来... ` def fig_v2(structure, colors): colors = colors / 255 colors = colors.tolist() for i in range(len(structure)): rgb = tuple(colors[i]) mlab.points3d(structure[i][0], structure[i][1], structure[i][2], mode = 'point', name = 'dinosaur', color...

Hardware requirements for GLM-chinese-10B

> > > For finetuning, the optimize states consume a lot of memory. You can enable ZeRO-Offload (https://www.deepspeed.ai/tutorials/zero-offload/) to offload the optimizer states to the CPU memory. By default, we...

Hardware requirements for GLM-chinese-10B

> > > > 内存多少呢？ ZeRO-2 + cpu offload=True + batch=1 + fp16 V100(32GB) * 2 + 3090(24GB) * 4 = 160GB 启动前内存情况 total:376GB，used:21GB，free:214GB building GLM model ... 内存占用从21GB提升至97GB DeepSpeed...