DeepSpeed-MII
DeepSpeed-MII copied to clipboard
inference_core_ops.so: undefined symbol: _Z19cuda_wf6af16_linearRN2at6TensorES1_S1_S1_S1_S1_iiii
Environment: Ubuntu 22.04.4 LTS Cuda compilation tools, release 12.1, V12.1.66 Build cuda_12.1.r12.1/compiler.32415258_0 ds_report added at the end of the description
Issue: Not able to successfully run example scripts using MII. Getting the following error: inference_core_ops.so: undefined symbol: _Z19cuda_wf6af16_linearRN2at6TensorES1_S1_S1_S1_S1_iiii. However, I'm able to run the deepspeed inference directly (not using MII) without any issues. Tried different torch and cuda versions the result is the same.
Running the base example script: import mii pipe = mii.pipeline("mistralai/Mistral-7B-v0.1") response = pipe(["DeepSpeed is", "Seattle is"], max_new_tokens=128) print(response)
output
..............................................................................
[10/10] c++ core_ops.o bias_activation.o bias_activation_cuda.cuda.o layer_norm.o layer_norm_cuda.cuda.o rms_norm.o rms_norm_cuda.cuda.o gated_activation_kernels.o gated_activation_kernels_cuda.cuda.o -shared -L/home/andrew/.local/lib/python3.10/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/usr/local/cuda-12.1/lib64 -lcudart -o inference_core_ops.so
Loading extension module inference_core_ops...
Traceback (most recent call last):
File "/home/andrew/Projects/Deepspeed_examples/./ds_test.py", line 2, in
pipe = mii.pipeline("mistralai/Mistral-7B-v0.1")
File "/home/andrew/.local/lib/python3.10/site-packages/mii/api.py", line 207, in pipeline
inference_engine = load_model(model_config)
File "/home/andrew/.local/lib/python3.10/site-packages/mii/modeling/models.py", line 17, in load_model
inference_engine = build_hf_engine(
File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/engine_factory.py", line 129, in build_hf_engine
return InferenceEngineV2(policy, engine_config)
File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/engine_v2.py", line 83, in init
self._model = self._policy.build_model(self._config, self._base_mp_group)
File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/model_implementations/inference_policy_base.py", line 156, in build_model
self.model = self.instantiate_model(engine_config, mp_group)
File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/model_implementations/mistral/policy.py", line 17, in instantiate_model
return MistralInferenceModel(config=self._model_config, engine_config=engine_config, base_mp_group=mp_group)
File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/model_implementations/inference_transformer_base.py", line 215, in init
self.make_norm_layer()
File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/model_implementations/inference_transformer_base.py", line 518, in make_norm_layer
self.norm = heuristics.instantiate_pre_norm(norm_config, self._engine_config)
File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/modules/heuristics.py", line 167, in instantiate_pre_norm
return DSPreNormRegistry.instantiate_config(config)
File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/modules/module_registry.py", line 36, in instantiate_config
if not target_implementation.supports_config(config_bundle.config):
File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/modules/implementations/pre_norm/cuda_pre_rms.py", line 36, in supports_config
_ = CUDARMSPreNorm(config.channels, config.residual_dtype)
File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/inference/v2/kernels/core_ops/cuda_rms_norm/rms_norm_base.py", line 36, in init
self.inf_module = InferenceCoreBuilder().load()
File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 479, in load
return self.jit_load(verbose)
File "/home/andrew/.local/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 523, in jit_load
op_module = load(name=self.name,
File "/home/andrew/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1306, in load
return _jit_compile(
File "/home/andrew/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1736, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/home/andrew/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2132, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
File "", line 571, in module_from_spec
File "", line 1176, in create_module
File "", line 241, in _call_with_frames_removed
ImportError: /home/andrew/.cache/torch_extensions/py310_cu121/inference_core_ops/inference_core_ops.so: undefined symbol: _Z19cuda_wf6af16_linearRN2at6TensorES1_S1_S1_S1_S1_iiii
DS_REPORT: JIT compiled ops requires ninja ninja .................. [OKAY]
op name ................ installed .. compatible
async_io ............... [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [NO] ....... [OKAY] cpu_adagrad ............ [NO] ....... [OKAY] cpu_lion ............... [NO] ....... [OKAY] evoformer_attn ......... [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_lion ............. [NO] ....... [OKAY] inference_core_ops ..... [NO] ....... [OKAY] cutlass_ops ............ [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] ragged_device_ops ...... [NO] ....... [OKAY] ragged_ops ............. [NO] ....... [OKAY] random_ltd ............. [NO] ....... [OKAY] [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2 [WARNING] using untested triton version (2.2.0), only 1.0.0 is known to be compatible sparse_attn ............ [NO] ....... [NO] spatial_inference ...... [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]
DeepSpeed general environment info: torch install path ............... ['/home/andrew/.local/lib/python3.10/site-packages/torch'] torch version .................... 2.2.2+cu121 deepspeed install path ........... ['/home/andrew/.local/lib/python3.10/site-packages/deepspeed'] deepspeed info ................... 0.14.0, unknown, unknown torch cuda version ............... 12.1 torch hip version ................ None nvcc version ..................... 12.1 deepspeed wheel compiled w. ...... torch 2.2, cuda 12.1 shared memory (/dev/shm) size .... 172.11 GB
same problem here
same to you, I have no way to solve it
If I'm using Conda and Python 3.9, I'm not getting this error, but the process is stuck in the server starting phase.
I simply change to VLLM.. sorry Microsoft :(
Yep, VLLM and HF TGI are working with no issues.
It seems this issue was previously reported under different titles:
https://github.com/microsoft/DeepSpeed-MII/issues/443
Fix the FP6 kernels compilation problem on non-Ampere GPUs. microsoft/DeepSpeed#5333
Proposed workaround: Downgrading to this will work: deepspeed 0.13.5 deepspeed-mii 0.2.2
Didn't work for me