[BUG] Can not use deepspeed.layer.moe
Describe the bug
Importing the deepspeed.layer.moe throws raises this ValueError:
ValueError: Target parameter "qkv_w" not found in this layer. Valid targets are []
from: https://github.com/deepspeedai/DeepSpeed/blob/e993fea38efe654592b956d1ab52e340bfbf9714/deepspeed/inference/v2/model_implementations/layer_container_base.py#L97-L99
and this ValueError:
ValueError: Must have wildcard (*) in source name for ParametrizedList mapping: self_attn.q_proj.weight
from:
https://github.com/deepspeedai/DeepSpeed/blob/e993fea38efe654592b956d1ab52e340bfbf9714/deepspeed/inference/v2/model_implementations/layer_container_base.py#L125-L126
To Reproduce
Run this script with a install of current master branch:
from deepspeed.moe.layer import MoE
print("Hello DeepSpeed!")
Expected behavior I can normally use the moe layer :-)
ds_report output This also causes the same error? Continuing instead of raising the error on both cases prevents any problems so here:
(moe) [lukef@luke HTYLLM-PG]$ ds_report
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
async_io ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_lion ............... [NO] ....... [OKAY]
dc ..................... [NO] ....... [OKAY]
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
evoformer_attn ......... [NO] ....... [NO]
[WARNING] FP Quantizer is using an untested triton version (3.5.0), only 2.3.(0, 1) and 3.0.0 are known to be compatible with these kernels
fp_quantizer ........... [NO] ....... [NO]
fused_lamb ............. [NO] ....... [OKAY]
fused_lion ............. [NO] ....... [OKAY]
gds .................... [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
inference_core_ops ..... [NO] ....... [OKAY]
cutlass_ops ............ [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
ragged_device_ops ...... [NO] ....... [OKAY]
ragged_ops ............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.9
[WARNING] using untested triton version (3.5.0), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/lukef/miniconda3/envs/moe/lib/python3.14/site-packages/torch']
torch version .................... 2.9.0+cu128
deepspeed install path ........... ['/home/lukef/Dokumente/GitHub/HTYLLM-PG/DeepSpeed/deepspeed']
deepspeed info ................... 0.18.2+2f232b9d, 2f232b9d, master
torch cuda version ............... 12.8
torch hip version ................ None
nvcc version ..................... 13.0
deepspeed wheel compiled w. ...... torch 0.0, cuda 0.0
shared memory (/dev/shm) size .... 15.27 GB
System info (please complete the following information):
- OS: 6.17.5-arch1-1
- GPU: RTX 4070 Ti super (x1)
- Python version: Python 3.14.0
- Any other relevant info about your setup
Same issue. Downgrade deepspeed doesn't help. But when I switched to python 3.13 and 3.12, this error is gone. So it seems like a bug with python 3.14