Shuo Zhang comments

Results 10 comments of


                                            Shuo Zhang

[BUG] pin_memory() raises an error when using pipeline parallelism

@tjruwase My devices is 8*A100 (80G) and 1024G of RAM. And I have fonud another solution. I found that the `pin_memory: false` in `ds_config` didn't do anything. So I add...

CUDA error: OS call failed or operation not supported on this OS when initializing pipeline

Hi @x54-729 I met the exactly same problem and I notice that it will work fine when I disable the **offload optimizer** feature. Not sure why.

CUDA error: OS call failed or operation not supported on this OS when initializing pipeline

See #3481

raise ValueError( ValueError: At least one of the model submodule will be offloaded to disk, please pass along an `offload_folder`.

Hi @ctjian 出现该问题是由于您的内存与显存均无法容纳该模型，因此需要将模型的一部分参数下移至硬盘。您可以尝试; 1. 换用显存更大的显卡或者增加显卡数量。请通过设置环境变量 `CUDA_VISIBLE_DEVICES` 来设置模型可用的显卡数量，例如 [该位置](https://github.com/OpenLMLab/MOSS/blob/5775a3ef16338550efc96fdf7da06a45de69af3e/moss_cli_demo.py#L2)； 2. 提高 CPU 可用内存； 3. 在 `load_checkpoint_and_dispatch` 函数中设置 `offload_folder` 参数，例如 [该位置](https://github.com/OpenLMLab/MOSS/blob/5775a3ef16338550efc96fdf7da06a45de69af3e/moss_cli_demo.py#L31)（请注意该操作可能严重降低模型推理的效率）。

RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

单精度推理无法在 CPU 上执行，您需要将 `model` 与 `input` 全部转移到您的 GPU上，或者将 `model` 的 `dtype` 设置为 `torch.float32`。

RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

您可能需要运行： ```python model = model.cuda() inputs["input_ids"] = inputs["input_ids"].cuda() inputs["attention_mask"] = inputs["attention_mask"].cuda() ```

When will we release InternLM2 1.8B[Feature]

InternLM2-1.8B has been open-sourced. internlm/internlm2-1_8b: https://huggingface.co/internlm/internlm2-1_8b internlm/internlm2-chat-1_8b-sft: https://huggingface.co/internlm/internlm2-chat-1_8b-sft

Error： llama2 70B LlamaForCausalLM.from_pretrained 开启Zero3，会消耗大量内存导致 OOM

@xiaopqr 您好，很抱歉造成您使用当中的不便，此问题已在 1871bcb26a4d879a25914e3daf909dc0ee636053 中修复，请使用dev分支的版本，或者等待下个版本的主分支合并。

deep_speed initialization for models in the transformers library

Hi @DesperateExplorer , Collie can use models from transformers, in the case of ZeRO parallelism. But you need to execute ``setup_distribution`` manually: ```python from collie import setup_distribution, CollieConfig from transformers...

[BUG] CUDA error: an illegal memory access was encountered with Adam optimizer on H100

@szhengac Hi, have you found the solutions?