DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

[BUG] Can not use deepspeed.layer.moe

Open LckyLke opened this issue 1 month ago • 1 comments

Describe the bug Importing the deepspeed.layer.moe throws raises this ValueError: ValueError: Target parameter "qkv_w" not found in this layer. Valid targets are [] from: https://github.com/deepspeedai/DeepSpeed/blob/e993fea38efe654592b956d1ab52e340bfbf9714/deepspeed/inference/v2/model_implementations/layer_container_base.py#L97-L99

and this ValueError:

ValueError: Must have wildcard (*) in source name for ParametrizedList mapping: self_attn.q_proj.weight from: https://github.com/deepspeedai/DeepSpeed/blob/e993fea38efe654592b956d1ab52e340bfbf9714/deepspeed/inference/v2/model_implementations/layer_container_base.py#L125-L126

To Reproduce

Run this script with a install of current master branch:

from deepspeed.moe.layer import MoE

print("Hello DeepSpeed!")

Expected behavior I can normally use the moe layer :-)

ds_report output This also causes the same error? Continuing instead of raising the error on both cases prevents any problems so here:

(moe) [lukef@luke HTYLLM-PG]$ ds_report
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
async_io ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_lion ............... [NO] ....... [OKAY]
dc ..................... [NO] ....... [OKAY]
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
evoformer_attn ......... [NO] ....... [NO]
 [WARNING]  FP Quantizer is using an untested triton version (3.5.0), only 2.3.(0, 1) and 3.0.0 are known to be compatible with these kernels
fp_quantizer ........... [NO] ....... [NO]
fused_lamb ............. [NO] ....... [OKAY]
fused_lion ............. [NO] ....... [OKAY]
gds .................... [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
inference_core_ops ..... [NO] ....... [OKAY]
cutlass_ops ............ [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
ragged_device_ops ...... [NO] ....... [OKAY]
ragged_ops ............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.9
 [WARNING]  using untested triton version (3.5.0), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/lukef/miniconda3/envs/moe/lib/python3.14/site-packages/torch']
torch version .................... 2.9.0+cu128
deepspeed install path ........... ['/home/lukef/Dokumente/GitHub/HTYLLM-PG/DeepSpeed/deepspeed']
deepspeed info ................... 0.18.2+2f232b9d, 2f232b9d, master
torch cuda version ............... 12.8
torch hip version ................ None
nvcc version ..................... 13.0
deepspeed wheel compiled w. ...... torch 0.0, cuda 0.0
shared memory (/dev/shm) size .... 15.27 GB

System info (please complete the following information):

  • OS: 6.17.5-arch1-1
  • GPU: RTX 4070 Ti super (x1)
  • Python version: Python 3.14.0
  • Any other relevant info about your setup

LckyLke avatar Nov 05 '25 15:11 LckyLke

Same issue. Downgrade deepspeed doesn't help. But when I switched to python 3.13 and 3.12, this error is gone. So it seems like a bug with python 3.14

MeteorsHub avatar Nov 20 '25 08:11 MeteorsHub