transformers-bloom-inference
transformers-bloom-inference copied to clipboard
"bloom-ds-zero-inference.py" works but "inference_server.cli --deployment_framework ds_zero" fails
deepspeed --num_gpus 4 bloom-inference-scripts/bloom-ds-zero-inference.py --name /raid/data/richardwang/bloomz --cpu_offload worked and gave me inference output. /raid/data/richardwang/bloomz is a downloaded copy of bigscience/bloomz
However python -m inference_server.cli --model_name /raid/data/richardwang/bloomz --model_class AutoModelForCausalLM --dtype bf16 --deployment_framework ds_zero --generate_kwargs '{"min_length": 100, "max_new_tokens": 100, "do_sample": false}' failed with error message:
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/richardwang/transformers-bloom-inference/inference_server/cli.py", line 43, in <module>
main()
File "/home/richardwang/transformers-bloom-inference/inference_server/cli.py", line 18, in main
model = ModelDeployment(args, True)
File "/home/richardwang/transformers-bloom-inference/inference_server/model_handler/deployment.py", line 54, in __init__
self.model = get_model_class(args.deployment_framework)(args)
File "/home/richardwang/transformers-bloom-inference/inference_server/models/ds_zero.py", line 51, in __init__
self.model = get_hf_model_class(args.model_class).from_pretrained(args.model_name, torch_dtype=args.dtype)
File "/home/richardwang/venv/llamaenv/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 464, in from_pretrained
return model_class.from_pretrained(
File "/home/richardwang/venv/llamaenv/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2357, in from_pretrained
init_contexts = [deepspeed.zero.Init(config_dict_or_path=deepspeed_config())] + init_contexts
File "/home/richardwang/venv/llamaenv/lib/python3.8/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 657, in __init__
_ds_config = deepspeed.runtime.config.DeepSpeedConfig(
File "/home/richardwang/venv/llamaenv/lib/python3.8/site-packages/deepspeed/runtime/config.py", line 808, in __init__
self._configure_train_batch_size()
File "/home/richardwang/venv/llamaenv/lib/python3.8/site-packages/deepspeed/runtime/config.py", line 991, in _configure_train_batch_size
self._batch_assertion()
File "/home/richardwang/venv/llamaenv/lib/python3.8/site-packages/deepspeed/runtime/config.py", line 926, in _batch_assertion
assert (
AssertionError: Train batch size: 0 has to be greater than 0
Environment:
torch 1.12.0+cu113
deepspeed 0.8.2
deepspeed-mii 0.0.4
transformers 4.26.1
I have check that both bloom-inference-scripts/bloom-ds-zero-inference.py and inference_server/models/ds_zero.py call the same thing AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16) How can I make it work to start up a server for bloomz with deepspeed inference?
this is weird. Ill look into this one
Hi @richarddwang,
I also want to load a local saved bloomz model for inference. However, when I tried to load my local bloomz checkpoint. it often gives me the following error:
Traceback (most recent call last):
File "bloom-inference-scripts/bloom-ds-zero-inference.py", line 66, in <module>
tokenizer = AutoTokenizer.from_pretrained(model_name)
File "/usr/local/lib/python3.7/dist-packages/transformers/models/auto/tokenization_auto.py", line 642, in from_pretrained
tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/transformers/models/auto/tokenization_auto.py", line 499, in get_tokenizer_config
_commit_hash=commit_hash,
File "/usr/local/lib/python3.7/dist-packages/transformers/utils/hub.py", line 420, in cached_file
local_files_only=local_files_only,
File "/usr/local/lib/python3.7/dist-packages/huggingface_hub/utils/_validators.py", line 112, in _inner_fn
validate_repo_id(arg_value)
File "/usr/local/lib/python3.7/dist-packages/huggingface_hub/utils/_validators.py", line 161, in validate_repo_id
"Repo id must be in the form 'repo_name' or 'namespace/repo_name':"
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/raid/data/richardwang/bloomz'. Use `repo_type` argument if needed.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2378) of binary: /usr/bin/python3
It seems like the code does not support loading local checkpoints. May I know what change did you make to allow it to load local checkpoints? I have tried neither bloom-inference-scripts/bloom-ds-inference.py nor bloom-inference-scripts/bloom-ds-zero-inference.py works
Thanks a lot:)
Hi @richarddwang,
I also want to load a local saved bloomz model for inference. However, when I tried to load my local bloomz checkpoint. it often gives me the following error:
Traceback (most recent call last): File "bloom-inference-scripts/bloom-ds-zero-inference.py", line 66, in <module> tokenizer = AutoTokenizer.from_pretrained(model_name) File "/usr/local/lib/python3.7/dist-packages/transformers/models/auto/tokenization_auto.py", line 642, in from_pretrained tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs) File "/usr/local/lib/python3.7/dist-packages/transformers/models/auto/tokenization_auto.py", line 499, in get_tokenizer_config _commit_hash=commit_hash, File "/usr/local/lib/python3.7/dist-packages/transformers/utils/hub.py", line 420, in cached_file local_files_only=local_files_only, File "/usr/local/lib/python3.7/dist-packages/huggingface_hub/utils/_validators.py", line 112, in _inner_fn validate_repo_id(arg_value) File "/usr/local/lib/python3.7/dist-packages/huggingface_hub/utils/_validators.py", line 161, in validate_repo_id "Repo id must be in the form 'repo_name' or 'namespace/repo_name':" huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/raid/data/richardwang/bloomz'. Use `repo_type` argument if needed. ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2378) of binary: /usr/bin/python3It seems like the code does not support loading local checkpoints. May I know what change did you make to allow it to load local checkpoints? I have tried neither
bloom-inference-scripts/bloom-ds-inference.pynorbloom-inference-scripts/bloom-ds-zero-inference.pyworksThanks a lot:)
Hello, I successfully used the zero3 stage acceleration for deepspeed inference and used two gpus, but I found that using two gpus did not perform data parallelism or model parallelism. Each apus asked the same question. How should I solve this problem? Thank you for your reply.
this repo is no longer being maintained @sevenandseven I suggest using vLLM or TGI