DeepSpeed
DeepSpeed copied to clipboard
[BUG] hybrid_engine for zero 3 seems invalid
from transformers import AutoTokenizer, AutoModel, AutoModelForCausalLM, AutoConfig, get_scheduler
import deepspeed
model = AutoModelForCausalLM.from_pretrained("models/opt-6.7b")
tokenizer = AutoTokenizer.from_pretrained("models/opt-6.7b", fast_tokenizer=True)
tokenizer.padding_side = 'left'
ds_config ={ 'train_micro_batch_size_per_gpu': 4, 'steps_per_print': 10, 'zero_optimization': {'stage': 3, 'offload_param': {'device': 'none'}, 'offload_optimizer': {'device': 'none'}, 'stage3_param_persistence_threshold': 10000.0, 'stage3_max_live_parameters': 30000000.0, 'stage3_prefetch_bucket_size': 30000000.0, 'memory_efficient_linear': False}, 'fp16': {'enabled': True, 'loss_scale_window': 100}, 'gradient_clipping': 1.0, 'prescale_gradients': False, 'wall_clock_breakdown': False,
'hybrid_engine': {'enabled': True, 'inference_tp_size': 1, 'release_inference_cache': False, 'pin_parameters': True, 'tp_gather_partition_size': 8}}
engine, *_ = deepspeed.initialize(model=model, config=ds_config)
engine.eval()
sent = ["Human: List five action models\n\nAssistant: ", "Human: hello\n\nAssistant: "]
inputs = tokenizer(sent, padding=True, return_tensors='pt')
inputs = inputs.to(model.device)
gen_kwargs = {"max_length": 512}
output = model.generate(inputs["input_ids"], **gen_kwargs)
this code dosn't work, some error info is here:
!!!! kernel execution error. (m: 12, n: 12, k: 0, error: 7)
** On entry to GEMM_EX parameter number 9 had an illegal value
!!!! kernel execution error. (m: 0, n: 12, k: 12, error: 7)
** On entry to GEMM_EX parameter number 9 had an illegal value
!!!! kernel execution error. (m: 4096, n: 24, k: 0, error: 7)
** On entry to GEMM_EX parameter number 16 had an illegal value
!!!! kernel execution error. (m: 0, n: 24, k: 4096, error: 7)
** On entry to GEMM_EX parameter number 9 had an illegal value
......
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below
might be incorrect.
......
AssertionError: {'id': 2, 'status': 'AVAILABLE', 'numel': 4096, 'ds_numel': 4096, 'shape': (4096,),
'ds_shape': (4096,), 'requires_grad': True, 'grad_shape': None, 'persist': True, 'active_sub_modules':
{5}, 'ds_tensor.shape': torch.Size([4096])}
However when I set stage to 0 in dsconfig, it can run normally. So zero 3 seems invalid. my deepspeed version is 0.9.3 and transformers is 4.29.0
I meet the same problem
I had the same issue and rolling back to 0.9.0 solved it.