ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

when I uese hybrid_parallel, and set the enable_fused_normalization = True. I can't run the code, here are some error: RuntimeError: Failed to replace input_layernorm of type LlamaRMSNorm with FusedRMSNorm with the exception: Please install apex from source (https://github.com/NVIDIA/apex) to use the fused RMS normalization kernel. Please check your model configuration or sharding policy, you can set up an issue for us to help you as well. However, I have install the apex, it will still occur. How can i solve it?

Open chensimian opened this issue 1 year ago • 9 comments

🐛 Describe the bug

raise RuntimeError( RuntimeError: Failed to replace input_layernorm of type LlamaRMSNorm with FusedRMSNorm with the exception: Please install apex from source (https://github.com/NVIDIA/apex) to use the fused RMS normalization kernel. Please check your model configuration or sharding policy, you can set up an issue for us to help you as well.

Environment

    plugin = HybridParallelPlugin(
        tp_size=8, 
        pp_size=1,
        num_microbatches=None,
        microbatch_size=1,
        enable_fused_normalization=True, #
        enable_jit_fused=True,
        enable_flash_attention=True,
        check_reduction=True,
        gradient_as_bucket_view=True,
        find_unused_parameters=True,
        zero_stage=0,
        precision="bf16",  # fp32
        initial_scale=1,
    )

chensimian avatar Nov 17 '23 02:11 chensimian

Hi, Please install apex from https://github.com/NVIDIA/apex, or set enable_fused_normlization to False.

flybird11111 avatar Nov 17 '23 02:11 flybird11111

Hi, Please install apex from https://github.com/NVIDIA/apex, or set enable_fused_normlization to False. I have installed it, but it is not working.

chensimian avatar Nov 17 '23 02:11 chensimian

Maybe the version of apex is not correct, can you have a try that "from apex.normalization import FusedRMSNorm"

flybird11111 avatar Nov 17 '23 03:11 flybird11111

Me too !!

RuntimeError: Failed to replace input_layernorm of type LlamaRMSNorm with FusedRMSNorm with the exception: No module named 'fused_layer_norm_cuda'. Please check your model configuration or sharding policy, you can set up an issue for us to help you as well.

yeegnauh avatar Nov 17 '23 08:11 yeegnauh

And I saw this prompt in examples/language/llama2/scripts/benchmark_70B/3d.sh

# TODO: fix this
echo "3D parallel for LLaMA-2 is not ready yet"

Does it mean , even if I deployed apex correctly, I won't be able to use hybrid_parallel properly ?

yeegnauh avatar Nov 17 '23 08:11 yeegnauh

And I saw this prompt in examples/language/llama2/scripts/benchmark_70B/3d.sh

# TODO: fix this
echo "3D parallel for LLaMA-2 is not ready yet"

Does it mean , even if I deployed apex correctly, I won't be able to use hybrid_parallel properly ?

Hybrid parallelism can normally work now, Could you run Python and then execute from apex.normalization import FusedRMSNorm to see if it runs successfully?

flybird11111 avatar Nov 17 '23 08:11 flybird11111

And I saw this prompt in examples/language/llama2/scripts/benchmark_70B/3d.sh

# TODO: fix this
echo "3D parallel for LLaMA-2 is not ready yet"

Does it mean , even if I deployed apex correctly, I won't be able to use hybrid_parallel properly ?

Hybrid parallelism can normally work now, Could you run Python and then execute from apex.normalization import FusedRMSNorm to see if it runs successfully?

Yes, python -c "from apex.normalization import FusedRMSNorm" runs successfully.

yeegnauh avatar Nov 17 '23 08:11 yeegnauh

And I saw this prompt in examples/language/llama2/scripts/benchmark_70B/3d.sh

# TODO: fix this
echo "3D parallel for LLaMA-2 is not ready yet"

Does it mean , even if I deployed apex correctly, I won't be able to use hybrid_parallel properly ?

Hybrid parallelism can normally work now, Could you run Python and then execute from apex.normalization import FusedRMSNorm to see if it runs successfully?

Yes, python -c "from apex.normalization import FusedRMSNorm" runs successfully.

https://blog.csdn.net/iteapoy/article/details/117389407 , please try this.

flybird11111 avatar Nov 17 '23 08:11 flybird11111

Can you share your pip list and your cuda version?

flybird11111 avatar Nov 21 '23 08:11 flybird11111