DeepSpeed
DeepSpeed copied to clipboard
[BUG] Loading checkpoint in the inference script fails when not using kernel injection
Describe the bug I was running the DeepSpeed inference example with kernel injection set to False, and the script has trouble loading checkpoints. (If kernel injection set to True it works).
I'm using OPT model this time, and have reported the same issue here a few days for bloom but got no response. I believe this is model-agnostic script issue.
To Reproduce deepspeed --num_gpus=2 inference/huggingface/text-generation/inference-test.py --name "facebook/opt-13b" --dtype float16 --use_meta_tensor --replace_method "auto" --ds_inference
Error:
[2023-04-18 19:18:25,836] [INFO] [launch.py:229:main] WORLD INFO DICT: {'localhost': [0, 1]}
[2023-04-18 19:18:25,837] [INFO] [launch.py:235:main] nnodes=1, num_local_procs=2, node_rank=0
[2023-04-18 19:18:25,837] [INFO] [launch.py:246:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]})
[2023-04-18 19:18:25,837] [INFO] [launch.py:247:main] dist_world_size=2
[2023-04-18 19:18:25,837] [INFO] [launch.py:249:main] Setting CUDA_VISIBLE_DEVICES=0,1
[2023-04-18 19:18:27,949] [INFO] [utils.py:785:see_memory_usage] before init
[2023-04-18 19:18:27,949] [INFO] [utils.py:786:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB
[2023-04-18 19:18:27,950] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 5.36 GB, percent = 2.9%
Fetching 23 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 23/23 [00:00<00:00, 68906.42it/s]
[2023-04-18 19:18:28,418] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.9.0, git-hash=unknown, git-branch=unknown
[2023-04-18 19:18:28,419] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter replace_method is deprecated. This parameter is no longer needed, please remove from your call to DeepSpeed-inference
[2023-04-18 19:18:28,419] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[2023-04-18 19:18:28,419] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
Traceback (most recent call last):
File "/home/xx/DeepSpeedExamples/inference/huggingface/text-generation/inference-test.py", line 74, in <module>
pipe.model = deepspeed.init_inference(pipe.model,
File "/home/xx/venv/lib/python3.9/site-packages/deepspeed/__init__.py", line 324, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)
File "/home/xx/venv/lib/python3.9/site-packages/deepspeed/inference/engine.py", line 155, in __init__
self._load_checkpoint(config.checkpoint)
File "/home/xx/venv/lib/python3.9/site-packages/deepspeed/inference/engine.py", line 450, in _load_checkpoint
load_path, checkpoint, quantize_config = sd_loader.load(self._config.tensor_parallel.tp_size,
AttributeError: 'dict' object has no attribute 'load'
Expected behavior We should be able to load checkpoints with meta tensor when not using kernel injection.
ds_report output
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
[WARNING] sparse_attn requires a torch version >= 1.5 but detected 2.0
[WARNING] using untested triton version (2.0.0), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/xx/venv/lib/python3.9/site-packages/torch']
torch version .................... 2.0.0+cu117
deepspeed install path ........... ['/home/xx/venv/lib/python3.9/site-packages/deepspeed']
deepspeed info ................... 0.9.0, unknown, unknown
torch cuda version ............... 11.7
torch hip version ................ None
nvcc version ..................... 11.7
deepspeed wheel compiled w. ...... torch 2.0, cuda 11.7
Also running into this issue.
Hi @brevity2021,
Meta tensor checkpoint loading is only supported when kernel injection is enabled. Please provide the --use_kernel
argument to the inference-test.py
script when running this example and using meta tensors.
Additional assertions have also been added on the DeepSpeed InferenceEngine
side to check that kernel injection is enabled when meta tensors are used https://github.com/microsoft/DeepSpeed/pull/2940.
Thanks, Lev
@lekurile Is that still true after https://github.com/microsoft/DeepSpeed/pull/3102 was merged?