DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

[BUG] Mis-typed free_blocks

Open lshamis opened this issue 2 months ago • 0 comments

Describe the bug The free_blocks variable is misused. It seems to be a list[int], but is sometimes used as an int, causing the program to crash.

It is correctly used in deepspeed/inference/v2/model_implementations/inference_transformer_base.py https://github.com/microsoft/DeepSpeed/blob/a8b821535aa0b254efa681d51b4951734ca021cc/deepspeed/inference/v2/model_implementations/inference_transformer_base.py#L366-L367

But incorrectly used in deepspeed/inference/v2/engine_v2.py https://github.com/microsoft/DeepSpeed/blob/a8b821535aa0b254efa681d51b4951734ca021cc/deepspeed/inference/v2/engine_v2.py#L200 https://github.com/microsoft/DeepSpeed/blob/a8b821535aa0b254efa681d51b4951734ca021cc/deepspeed/inference/v2/engine_v2.py#L214

To Reproduce

from mii.config import ModelConfig
from mii.modeling.models import load_model
from mii.modeling.tokenizers import load_tokenizer

config = ModelConfig(model_name_or_path="/path/to/llama-2-7b-hf", max_length=4096)
engine = load_model(config)
model = engine.model()
tokenizer = load_tokenizer(config)

tokens = tokenizer.encode("It is a period of civil wars in the galaxy. A brave alliance")

engine.put(batch_uids=[128], batch_tokens=[tokens])

When run, it will crash with:

Traceback (most recent call last):
  File "/tmp/tmp.py", line 86, in <module>
    main()
  File "/tmp/tmp.py", line 60, in main
    engine.put(batch_uids=[128], batch_tokens=[tokens])
  File "/path/conda/ds/lib/python3.11/site-packages/deepspeed/inference/v2/engine_v2.py", line 124, in put
    schedule_check = self.can_schedule(batch_uids, token_lens)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/conda/ds/lib/python3.11/site-packages/deepspeed/inference/v2/engine_v2.py", line 214, in can_schedule
    sched_len, sched_blocks = self._model.get_kv_requirements(seq_desc, length, free_blocks)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/conda/ds/lib/python3.11/site-packages/deepspeed/inference/v2/model_implementations/inference_transformer_base.py", line 348, in get_kv_requirements
    if block_lim <= max_new_blocks:
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: '<=' not supported between instances of 'int' and 'list'

Expected behavior engine.put should return the logits.

ds_report output TODO

Screenshots TODO

System info (please complete the following information): TODO

Docker context TODO

Additional context TODO

lshamis avatar Apr 12 '24 20:04 lshamis