DeepSpeed
DeepSpeed copied to clipboard
[BUG] deepspeed.init_inference() erroneously attempts to copy out of meta tensor
The bug
In deepspeed.module_inject.replace_module.py
, replace_module()
is being called on meta tensors before the actual weights are loaded just a few lines below, resulting in NotImplementedError: Cannot copy out of meta tensor; no data!
error.
The code excerpt
config = AutoConfig.from_pretrained(MODEL_NAME)
with deepspeed.OnDevice(dtype=torch.float16, device='meta'):
model = AutoModel.from_config(config, torch_dtype=torch.float16)
ds_inference_config = {
'tensor_parallel': {'tp_size': 2},
'dtype': torch.float16,
'checkpoint': checkpoint_json,
'kernel_inject': True,
}
ds_engine = deepspeed.init_inference(
model,
config=ds_inference_config,
)
checkpoint_json
{
"type": "Megatron",
"version": 1.0,
"checkpoints": [
"/home/user/models/opt_iml_30b/max/checkpoint_1_6000.pt-model_part-0.pt",
"/home/user/models/opt_iml_30b/max/checkpoint_1_6000.pt-model_part-1.pt"
]
}
The error
Traceback (most recent call last):
File "src/test_tp.py", line 87, in <module>
init_tokenizer_model_deepspeed_w_TP('config/checkpoints-opt-iml-30b-max.json')
File "src/test_tp.py", line 57, in init_tokenizer_model_deepspeed_w_TP
ds_engine = deepspeed.init_inference(
File "/home/user/workspace/deepspeed_llm/venv/lib/python3.8/site-packages/deepspeed/__init__.py", line 311, in init_inference
engine = InferenceEngine(model, config=ds_inference_config, metaseq_opt_to_pt=metaseq_opt_to_pt)
File "/home/user/workspace/deepspeed_llm/venv/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 145, in __init__
self._apply_injection_policy(config)
File "/home/user/workspace/deepspeed_llm/venv/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 372, in _apply_injection_policy
replace_transformer_layer(client_module,
File "/home/user/workspace/deepspeed_llm/venv/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 534, in replace_transformer_layer
replaced_module = replace_module(model=model,
File "/home/user/workspace/deepspeed_llm/venv/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 800, in replace_module
replaced_module, _ = _replace_module(model, policy)
File "/home/user/workspace/deepspeed_llm/venv/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 827, in _replace_module
_, layer_id = _replace_module(child, policies, layer_id=layer_id)
File "/home/user/workspace/deepspeed_llm/venv/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 827, in _replace_module
_, layer_id = _replace_module(child, policies, layer_id=layer_id)
File "/home/user/workspace/deepspeed_llm/venv/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 817, in _replace_module
replaced_module = policies[child.__class__][0](child,
File "/home/user/workspace/deepspeed_llm/venv/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 524, in replace_fn
new_module = replace_with_policy(child,
File "/home/user/workspace/deepspeed_llm/venv/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 388, in replace_with_policy
_container.transpose()
File "/home/user/workspace/deepspeed_llm/venv/lib/python3.8/site-packages/deepspeed/module_inject/containers/features/meta_tensor.py", line 35, in transpose
super().transpose()
File "/home/user/workspace/deepspeed_llm/venv/lib/python3.8/site-packages/deepspeed/module_inject/containers/base.py", line 232, in transpose
self.transpose_mlp()
File "/home/user/workspace/deepspeed_llm/venv/lib/python3.8/site-packages/deepspeed/module_inject/containers/base.py", line 241, in transpose_mlp
self._h4h_w = self.transpose_impl(self._h4h_w.data)
File "/home/user/workspace/deepspeed_llm/venv/lib/python3.8/site-packages/deepspeed/module_inject/containers/base.py", line 251, in transpose_impl
data.to(get_accelerator().current_device_name())
NotImplementedError: Cannot copy out of meta tensor; no data!
Some line numbers in the trackback may be inaccurate due to incorporating changes from GH-2940 and my own code.
ds_report
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
[WARNING] please install triton==1.0.0 if you want to use sparse attention
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/user/workspace/deepspeed_llm/venv/lib/python3.8/site-packages/torch']
torch version .................... 1.12.1+cu113
deepspeed install path ........... ['/home/user/workspace/deepspeed_llm/venv/lib/python3.8/site-packages/deepspeed']
deepspeed info ................... 0.8.2, unknown, unknown
torch cuda version ............... 11.3
torch hip version ................ None
nvcc version ..................... 11.3
deepspeed wheel compiled w. ...... torch 1.12, cuda 11.3
System info
- Ubuntu 20.04.4 LTS
- 2x NVIDIA TITAN RTX on a single machine
- Python 3.8.1
- deepspeed==0.8.2, with inclusions of the GH-2940 changes
- accelerate==0.17.0, mpi4py==3.1.4, numpy==1.24.2, torch==1.12.1+cu113, transformers==4.26.1
Additional context I am trying to load OPT-IML-30B downloaded as 2 TPs from Metaseq before I move on to OPT-IML-175B which has 16 TPs.
Please advise on how to proceed, thank you!
I also encounter this issue. I run inference_test.py to load OPT-IML-30B downloaded from Huggingface.
Hi, any updates?
I could successfully run the script. I first saved the sharded checkpoints to a custom directory and then load the sharded ones for inference (This 2379 helps me a lot! ).
Maybe you could try to set arg replace_method
as auto
.
Hi @qtli, appreciate the suggestion, but I did not use 'replace_method': 'auto'
following PR #2831. I did try to run it again upon your suggestion for good measure though -- same error. I also did not use the method to obtain tensor parallels from HuggingFace weights as avoiding HF is the goal (since 175B is not on HF). I want to use metaseq OPT-IML TPs directly.
This error is encountered when 'checkpoint': checkpoint_json
is used, 'replace_with_kernel_inject': True
and isinstance(self.module, torch.nn.Module) == True
. Not sure if tp_size
> 1 contributes to the condition that encounters this error.
This could be related to #2616 but I am not sure. I circumvented the state_dict
issues by adding custom code in _load_checkpoint()
in engine.py
:
def _metaseq_opt_to_pt(sd):
keys_to_delete = [
"decoder.version",
]
for key in keys_to_delete:
if key in sd:
sd.pop(key)
keys_to_rename = {
"decoder.layer_norm.weight": "decoder.final_layer_norm.weight",
"decoder.layer_norm.bias": "decoder.final_layer_norm.bias",
}
for old_key, new_key in keys_to_rename.items():
if old_key in sd:
sd[new_key] = sd.pop(old_key)
for key in list(sd.keys()):
if ".qkv_proj." in key:
q_name = key.replace(".qkv_proj.", ".q_proj.")
k_name = key.replace(".qkv_proj.", ".k_proj.")
v_name = key.replace(".qkv_proj.", ".v_proj.")
value = sd[key]
depth = value.shape[0]
assert depth % 3 == 0
# `SequeuceParallelTransformerBlock` has QKV weight is separated in K,V,Q despite the naming:
# https://cs.github.com/facebookresearch/metaseq/blob/51871bd73cd04c038f239ea2a26db1d7f6b37927/metaseq/modules/sequence_parallel_transformer_layer.py#L97
k, v, q = torch.split(value, depth // 3, dim=0)
sd[q_name] = q
sd[k_name] = k
sd[v_name] = v
del sd[key]
return sd
...
checkpoint[self._choose_module_key(checkpoint)] = _metaseq_opt_to_pt(checkpoint[self._choose_module_key(checkpoint)])
self.module.load_state_dict(
state_dict=checkpoint[self._choose_module_key(checkpoint)],
strict=load_module_strict)
Would appreciate any help or suggestion.
I am seeing this error too. DeepSpeed version 0.9.2
config = AutoConfig.from_pretrained(model_name)
with deepspeed.OnDevice(dtype=dtype, device="meta"):
model = AutoModelForCausalLM.from_config(config)
model = deepspeed.init_inference(
model,
tensor_parallel = tp_config,
base_dir=repo_root,
replace_with_kernel_inject=args.kernel_injection,
**kwargs
)
With replace_with_kernel_inject = False, I get this error:
model = deepspeed.init_inference(
File "/opt/conda/envs/inference/lib/python3.9/site-packages/deepspeed/__init__.py", line 333, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)
File "/opt/conda/envs/inference/lib/python3.9/site-packages/deepspeed/inference/engine.py", line 204, in __init__
self._apply_injection_policy(config, client_module)
File "/opt/conda/envs/inference/lib/python3.9/site-packages/deepspeed/inference/engine.py", line 396, in _apply_injection_policy
replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
File "/opt/conda/envs/inference/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 494, in replace_transformer_layer
replaced_module = replace_module(model=model,
File "/opt/conda/envs/inference/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 727, in replace_module
replaced_module, _ = _replace_module(model, policy)
File "/opt/conda/envs/inference/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 752, in _replace_module
_, layer_id = _replace_module(child, policies, layer_id=layer_id)
File "/opt/conda/envs/inference/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 752, in _replace_module
_, layer_id = _replace_module(child, policies, layer_id=layer_id)
File "/opt/conda/envs/inference/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 752, in _replace_module
_, layer_id = _replace_module(child, policies, layer_id=layer_id)
File "/opt/conda/envs/inference/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 744, in _replace_module
replaced_module = policies[child.__class__][0](child, policies[child.__class__][-1], layer_id)
File "/opt/conda/envs/inference/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 490, in replace_fn
new_module = replace_wo_policy(child, _policy)
File "/opt/conda/envs/inference/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 473, in replace_wo_policy
return _replace_module(module)
File "/opt/conda/envs/inference/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 470, in _replace_module
_replace_module(child, name)
File "/opt/conda/envs/inference/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 466, in _replace_module
setattr(r_module, name, linear_policies[child.__class__](child, prev_name + '.' + name,
File "/opt/conda/envs/inference/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 392, in _replace
data = mp_replace.copy(new_weight, child.weight.data)
File "/opt/conda/envs/inference/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 89, in copy
assert not dst.data.is_meta # the torch.Tensor.copy_ method used below will silently fail on meta tensors
With replace_with_kernel_inject = True:
File "/opt/conda/envs/inference/lib/python3.9/site-packages/deepspeed/__init__.py", line 333, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)
File "/opt/conda/envs/inference/lib/python3.9/site-packages/deepspeed/inference/engine.py", line 207, in __init__
self.module.to(device)
File "/opt/conda/envs/inference/lib/python3.9/site-packages/transformers/modeling_utils.py", line 1896, in to
return super().to(*args, **kwargs)
File "/opt/conda/envs/inference/lib/python3.9/site-packages/torch/nn/modules/module.py", line 927, in to
return self._apply(convert)
File "/opt/conda/envs/inference/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579, in _apply
module._apply(fn)
File "/opt/conda/envs/inference/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579, in _apply
module._apply(fn)
File "/opt/conda/envs/inference/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579, in _apply
module._apply(fn)
File "/opt/conda/envs/inference/lib/python3.9/site-packages/torch/nn/modules/module.py", line 602, in _apply
param_applied = fn(param)
File "/opt/conda/envs/inference/lib/python3.9/site-packages/torch/nn/modules/module.py", line 925, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
NotImplementedError: Cannot copy out of meta tensor; no data!
Does this depend on what weights are being loaded? I am running OPT from hugging-face.
@molohov Hi bro, have u solved this problem
the same issue here
The same issue
I had some success loading the model this way:
with deepspeed.OnDevice(dtype=dtype, device="meta"):
model = AutoModelForCausalLM.from_pretrained(model_name, low_cpu_mem_usage=True)
model = deepspeed.init_inference(
model,
tensor_parallel = tp_config,
base_dir=repo_root,
replace_with_kernel_inject=args.kernel_injection,
**kwargs
)
I think this is because low_cpu_mem_usage=True
initializes the HF model with meta tensors for you, allowing DS to copy it correctly.
I had some success loading the model this way:
with deepspeed.OnDevice(dtype=dtype, device="meta"): model = AutoModelForCausalLM.from_pretrained(model_name, low_cpu_mem_usage=True) model = deepspeed.init_inference( model, tensor_parallel = tp_config, base_dir=repo_root, replace_with_kernel_inject=args.kernel_injection, **kwargs )
I think this is because
low_cpu_mem_usage=True
initializes the HF model with meta tensors for you, allowing DS to copy it correctly.
Does this really work for anyone ? With OPT this fails for me