DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

[BUG] hybrid_engine for zero 3 seems invalid

Open leo5856 opened this issue 1 year ago • 2 comments

from transformers import AutoTokenizer, AutoModel, AutoModelForCausalLM, AutoConfig, get_scheduler
import deepspeed
model = AutoModelForCausalLM.from_pretrained("models/opt-6.7b")
tokenizer = AutoTokenizer.from_pretrained("models/opt-6.7b", fast_tokenizer=True)

tokenizer.padding_side = 'left'
ds_config ={ 'train_micro_batch_size_per_gpu': 4, 'steps_per_print': 10, 'zero_optimization': {'stage': 3, 'offload_param': {'device': 'none'}, 'offload_optimizer': {'device': 'none'}, 'stage3_param_persistence_threshold': 10000.0, 'stage3_max_live_parameters': 30000000.0, 'stage3_prefetch_bucket_size': 30000000.0, 'memory_efficient_linear': False}, 'fp16': {'enabled': True, 'loss_scale_window': 100}, 'gradient_clipping': 1.0, 'prescale_gradients': False, 'wall_clock_breakdown': False,
            'hybrid_engine': {'enabled': True, 'inference_tp_size': 1, 'release_inference_cache': False, 'pin_parameters': True, 'tp_gather_partition_size': 8}}
engine, *_ = deepspeed.initialize(model=model, config=ds_config)
engine.eval()       
sent = ["Human: List five action models\n\nAssistant: ", "Human: hello\n\nAssistant: "]
inputs = tokenizer(sent, padding=True, return_tensors='pt')
inputs = inputs.to(model.device)
gen_kwargs = {"max_length": 512}
output = model.generate(inputs["input_ids"], **gen_kwargs)

this code dosn't work, some error info is here:

!!!! kernel execution error. (m: 12, n: 12, k: 0, error: 7) 
 ** On entry to GEMM_EX  parameter number 9 had an illegal value
!!!! kernel execution error. (m: 0, n: 12, k: 12, error: 7) 
 ** On entry to GEMM_EX  parameter number 9 had an illegal value
!!!! kernel execution error. (m: 4096, n: 24, k: 0, error: 7) 
 ** On entry to GEMM_EX  parameter number 16 had an illegal value
!!!! kernel execution error. (m: 0, n: 24, k: 4096, error: 7) 
 ** On entry to GEMM_EX  parameter number 9 had an illegal value
......
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below 
might be incorrect.
......
AssertionError: {'id': 2, 'status': 'AVAILABLE', 'numel': 4096, 'ds_numel': 4096, 'shape': (4096,), 
'ds_shape': (4096,), 'requires_grad': True, 'grad_shape': None, 'persist': True, 'active_sub_modules': 
{5}, 'ds_tensor.shape': torch.Size([4096])}

However when I set stage to 0 in dsconfig, it can run normally. So zero 3 seems invalid. my deepspeed version is 0.9.3 and transformers is 4.29.0

leo5856 avatar May 23 '23 07:05 leo5856

I meet the same problem

beichengus avatar Jun 01 '23 09:06 beichengus

I had the same issue and rolling back to 0.9.0 solved it.

shuoyangd avatar Jun 06 '23 22:06 shuoyangd