DeepSpeed-MII
DeepSpeed-MII copied to clipboard
New microsoft/bloom-deepspeed-inference-fp16 weights not working with DeepSpeed MII
New microsoft/bloom-deepspeed-inference-fp16
and microsoft/bloom-deepspeed-inference-int8
weights not working with DeepSpeed MII
@jeffra @RezaYazdaniAminabadi
Traceback (most recent call last):
File "scripts/bloom-inference-server/server.py", line 83, in <module>
model = DSInferenceGRPCServer(args)
File "/net/llm-shared-nfs/nfs/mayank/BigScience-Megatron-DeepSpeed/scripts/bloom-inference-server/ds_inference/grpc_server.py", line 36, in __init__
mii.deploy(
File "/net/llm-shared-nfs/nfs/yelkurdi/conda/miniconda3/envs/llmpt/lib/python3.8/site-packages/mii/deployment.py", line 70, in deploy
mii.utils.check_if_task_and_model_is_valid(task, model)
File "/net/llm-shared-nfs/nfs/yelkurdi/conda/miniconda3/envs/llmpt/lib/python3.8/site-packages/mii/utils.py", line 108, in check_if_task_and_model_is_valid
assert (
AssertionError: text-generation only supports [.....]
The list of models doesn't contain the new weights.
Seems like there is a check in place which is not letting the new weights work with MII
Any updates on this? @jeffra @RezaYazdaniAminabadi
Also the same thing happens with the bigscience/bloom-350m for some reason.
I just ran the example in the README and I got the
AssertionError: text-generation only supports [.....]
error
https://github.com/huggingface/transformers-bloom-inference/blob/abe365066fec6e03ce0ea2cc8136f2da1254e2ea/bloom-inference-server/ds_inference/grpc_server.py#L33 @cderinbogaz I hacked my way around it for now I pass the downloaded model path and checkpoint dict for the model I need to use and the model="bigscience/bloom"
I know this is not the most elegant method to do this :(
Thanks for the response @mayank31398 ! I think its a neat solution :)
@mrwyattii I believe your commit yesterday has fixed this? Let me know. I am closely watching this repo :)
weight_quantizer.quantize(transpose(sd[0][prefix + 'self_attention.query_key_value.' + 'weight']))) File "/opt/conda/lib/python3.7/site-packages/deepspeed/module_inject/replace_module.py", line 100, in copy dim=self.in_dim)[self.gpu_index].to(
This is the error I got today while trying int8 inference with bloom.
Hi @TahaBinhuraib I think MII doesn't support int8 models. Can you try vanilla DS-inference?
https://github.com/huggingface/transformers-bloom-inference/tree/main/bloom-inference-server you can try running via a CLI/ deploy a generation server as given in the instructions ^^.
The fp16 Bloom weights are now supported. Int8 models are also supported, but currently the DeepSpeed sharded int8 weights for the Bloom model will throw an error. I'm working on a fix for this and automatic loading of the sharded weights (so you don't have to manually download the weights and define the checkpoint file list). Those changes will come in #69 and likely another PR.
Thanks @mrwyattii
Thanks @mrwyattii can't wait!
@mayank31398 @TahaBinhuraib I finally found the time to fix #69 so that it works with int8. You no longer need to download the sharded checkpoint files separately and MII will handle this for you (but it will take a while as the checkpoints are quite large). I just confirmed that it's working on my side, but if you have the opportunity to test it out, please do. The script I used:
import mii
mii_configs = {
"dtype": "int8",
"tensor_parallel": 4,
"port_number": 50950,
}
name = "microsoft/bloom-deepspeed-inference-int8"
mii.deploy(task='text-generation',
model=name,
deployment_name="bloom_deployment",
model_path="/data/bloom-ckpts",
mii_config=mii_configs)
You will probably want to change the model_path
parameter if you run this on your local machine.