DeepSpeed [BUG] Loading checkpoint in the inference script fails when not using kernel injection

Describe the bug I was running the DeepSpeed inference example with kernel injection set to False, and the script has trouble loading checkpoints. (If kernel injection set to True it works).

I'm using OPT model this time, and have reported the same issue here a few days for bloom but got no response. I believe this is model-agnostic script issue.

To Reproduce deepspeed --num_gpus=2 inference/huggingface/text-generation/inference-test.py --name "facebook/opt-13b" --dtype float16 --use_meta_tensor --replace_method "auto" --ds_inference

Error:

[2023-04-18 19:18:25,836] [INFO] [launch.py:229:main] WORLD INFO DICT: {'localhost': [0, 1]}
[2023-04-18 19:18:25,837] [INFO] [launch.py:235:main] nnodes=1, num_local_procs=2, node_rank=0
[2023-04-18 19:18:25,837] [INFO] [launch.py:246:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]})
[2023-04-18 19:18:25,837] [INFO] [launch.py:247:main] dist_world_size=2
[2023-04-18 19:18:25,837] [INFO] [launch.py:249:main] Setting CUDA_VISIBLE_DEVICES=0,1
[2023-04-18 19:18:27,949] [INFO] [utils.py:785:see_memory_usage] before init
[2023-04-18 19:18:27,949] [INFO] [utils.py:786:see_memory_usage] MA 0.0 GB         Max_MA 0.0 GB         CA 0.0 GB         Max_CA 0 GB
[2023-04-18 19:18:27,950] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory:  used = 5.36 GB, percent = 2.9%
Fetching 23 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 23/23 [00:00<00:00, 68906.42it/s]
[2023-04-18 19:18:28,418] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.9.0, git-hash=unknown, git-branch=unknown
[2023-04-18 19:18:28,419] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter replace_method is deprecated. This parameter is no longer needed, please remove from your call to DeepSpeed-inference
[2023-04-18 19:18:28,419] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[2023-04-18 19:18:28,419] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
Traceback (most recent call last):
  File "/home/xx/DeepSpeedExamples/inference/huggingface/text-generation/inference-test.py", line 74, in <module>
    pipe.model = deepspeed.init_inference(pipe.model,
  File "/home/xx/venv/lib/python3.9/site-packages/deepspeed/__init__.py", line 324, in init_inference
    engine = InferenceEngine(model, config=ds_inference_config)
  File "/home/xx/venv/lib/python3.9/site-packages/deepspeed/inference/engine.py", line 155, in __init__
    self._load_checkpoint(config.checkpoint)
  File "/home/xx/venv/lib/python3.9/site-packages/deepspeed/inference/engine.py", line 450, in _load_checkpoint
    load_path, checkpoint, quantize_config = sd_loader.load(self._config.tensor_parallel.tp_size,
AttributeError: 'dict' object has no attribute 'load'

Expected behavior We should be able to load checkpoints with meta tensor when not using kernel injection.

ds_report output

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
 [WARNING]  sparse_attn requires a torch version >= 1.5 but detected 2.0
 [WARNING]  using untested triton version (2.0.0), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/xx/venv/lib/python3.9/site-packages/torch']
torch version .................... 2.0.0+cu117
deepspeed install path ........... ['/home/xx/venv/lib/python3.9/site-packages/deepspeed']
deepspeed info ................... 0.9.0, unknown, unknown
torch cuda version ............... 11.7
torch hip version ................ None
nvcc version ..................... 11.7
deepspeed wheel compiled w. ...... torch 2.0, cuda 11.7

Apr 18 '23 19:04 brevity2021

Also running into this issue.

Apr 20 '23 18:04 Yard1

Hi @brevity2021,

Meta tensor checkpoint loading is only supported when kernel injection is enabled. Please provide the --use_kernel argument to the inference-test.py script when running this example and using meta tensors.

Additional assertions have also been added on the DeepSpeed InferenceEngine side to check that kernel injection is enabled when meta tensors are used https://github.com/microsoft/DeepSpeed/pull/2940.

Thanks, Lev

May 12 '23 17:05 lekurile

@lekurile Is that still true after https://github.com/microsoft/DeepSpeed/pull/3102 was merged?

May 12 '23 17:05 Yard1

DeepSpeed DeepSpeed copied to clipboard

[BUG] Loading checkpoint in the inference script fails when not using kernel injection

DeepSpeed
DeepSpeed copied to clipboard