DeepSpeed
DeepSpeed copied to clipboard
[ROCm] Declare gated_act_fn() as a device function
It is required to include __device__ in the declaration of gated_act_fn() for AMD compilers to avoid below errors during inference_core_ops extension build.
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/inference/v2/kernels/core_ops/gated_activations/gated_activation_kernels.hip:25:19: error: no function template matches function template specialization 'gated_act_fn'
DS_D_INLINE float gated_act_fn<ActivationType::GEGLU>(float x, float y)
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/inference/v2/kernels/core_ops/gated_activations/gated_activation_kernels.hip:22:7: note: candidate template ignored: target attributes do not match
float gated_act_fn(float x, float y);
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/inference/v2/kernels/core_ops/gated_activations/gated_activation_kernels.hip:33:19: error: no function template matches function template specialization 'gated_act_fn'
DS_D_INLINE float gated_act_fn<ActivationType::ReGLU>(float x, float y)
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/inference/v2/kernels/core_ops/gated_activations/gated_activation_kernels.hip:22:7: note: candidate template ignored: target attributes do not match
float gated_act_fn(float x, float y);
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/inference/v2/kernels/core_ops/gated_activations/gated_activation_kernels.hip:39:19: error: no function template matches function template specialization 'gated_act_fn'
DS_D_INLINE float gated_act_fn<ActivationType::SiGLU>(float x, float y)
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/inference/v2/kernels/core_ops/gated_activations/gated_activation_kernels.hip:22:7: note: candidate template ignored: target attributes do not match
float gated_act_fn(float x, float y);
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/inference/v2/kernels/core_ops/gated_activations/gated_activation_kernels.hip:82:33: error: no matching function for call to 'gated_act_fn'
float act_val = gated_act::gated_act_fn<ActType>(g_val, a_val);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/inference/v2/kernels/core_ops/gated_activations/gated_activation_kernels.hip:113:9: note: in instantiation of function template specialization 'gated_activation_kernel<float, GEGLU, 1>' requested here
DISPATCH_UNROLL(1);
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/inference/v2/kernels/core_ops/gated_activations/gated_activation_kernels.hip:92:25: note: expanded from macro 'DISPATCH_UNROLL'
hipLaunchKernelGGL(( gated_activation_kernel<T, ActType, unroll_val>) \
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/inference/v2/kernels/core_ops/gated_activations/gated_activation_kernels.hip:142:13: note: in instantiation of function template specialization 'launch_gated_activation_impl<float, GEGLU>' requested here
launch_gated_activation_impl<T, ActivationType::GEGLU>(
^
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/inference/v2/kernels/core_ops/gated_activations/gated_activation_kernels.hip:22:7: note: candidate function not viable: call to __host__ function from __global__ function
float gated_act_fn(float x, float y);
^
@mrwyattii and @cmikeh2 - do we expect these to work on AMD and thoughts on the change?
@rraminen - AMD isn't currently supported in FastGen, so does it make sense to merge this PR with later support for when that comes in? Since for now, this won't have any impact.