DeepSpeed
DeepSpeed copied to clipboard
[BUG] Mis-typed free_blocks
Describe the bug
The free_blocks
variable is misused. It seems to be a list[int], but is sometimes used as an int, causing the program to crash.
It is correctly used in deepspeed/inference/v2/model_implementations/inference_transformer_base.py https://github.com/microsoft/DeepSpeed/blob/a8b821535aa0b254efa681d51b4951734ca021cc/deepspeed/inference/v2/model_implementations/inference_transformer_base.py#L366-L367
But incorrectly used in deepspeed/inference/v2/engine_v2.py https://github.com/microsoft/DeepSpeed/blob/a8b821535aa0b254efa681d51b4951734ca021cc/deepspeed/inference/v2/engine_v2.py#L200 https://github.com/microsoft/DeepSpeed/blob/a8b821535aa0b254efa681d51b4951734ca021cc/deepspeed/inference/v2/engine_v2.py#L214
To Reproduce
from mii.config import ModelConfig
from mii.modeling.models import load_model
from mii.modeling.tokenizers import load_tokenizer
config = ModelConfig(model_name_or_path="/path/to/llama-2-7b-hf", max_length=4096)
engine = load_model(config)
model = engine.model()
tokenizer = load_tokenizer(config)
tokens = tokenizer.encode("It is a period of civil wars in the galaxy. A brave alliance")
engine.put(batch_uids=[128], batch_tokens=[tokens])
When run, it will crash with:
Traceback (most recent call last):
File "/tmp/tmp.py", line 86, in <module>
main()
File "/tmp/tmp.py", line 60, in main
engine.put(batch_uids=[128], batch_tokens=[tokens])
File "/path/conda/ds/lib/python3.11/site-packages/deepspeed/inference/v2/engine_v2.py", line 124, in put
schedule_check = self.can_schedule(batch_uids, token_lens)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/path/conda/ds/lib/python3.11/site-packages/deepspeed/inference/v2/engine_v2.py", line 214, in can_schedule
sched_len, sched_blocks = self._model.get_kv_requirements(seq_desc, length, free_blocks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/path/conda/ds/lib/python3.11/site-packages/deepspeed/inference/v2/model_implementations/inference_transformer_base.py", line 348, in get_kv_requirements
if block_lim <= max_new_blocks:
^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: '<=' not supported between instances of 'int' and 'list'
Expected behavior
engine.put
should return the logits.
ds_report output TODO
Screenshots TODO
System info (please complete the following information): TODO
Docker context TODO
Additional context TODO