transformers-bloom-inference "bloom-ds-zero-inference.py" works but "inference_server.cli --deployment_framework ds

deepspeed --num_gpus 4 bloom-inference-scripts/bloom-ds-zero-inference.py --name /raid/data/richardwang/bloomz --cpu_offload worked and gave me inference output. /raid/data/richardwang/bloomz is a downloaded copy of bigscience/bloomz

However python -m inference_server.cli --model_name /raid/data/richardwang/bloomz --model_class AutoModelForCausalLM --dtype bf16 --deployment_framework ds_zero --generate_kwargs '{"min_length": 100, "max_new_tokens": 100, "do_sample": false}' failed with error message:

Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/richardwang/transformers-bloom-inference/inference_server/cli.py", line 43, in <module>
    main()
  File "/home/richardwang/transformers-bloom-inference/inference_server/cli.py", line 18, in main
    model = ModelDeployment(args, True)
  File "/home/richardwang/transformers-bloom-inference/inference_server/model_handler/deployment.py", line 54, in __init__
    self.model = get_model_class(args.deployment_framework)(args)
  File "/home/richardwang/transformers-bloom-inference/inference_server/models/ds_zero.py", line 51, in __init__
    self.model = get_hf_model_class(args.model_class).from_pretrained(args.model_name, torch_dtype=args.dtype)
  File "/home/richardwang/venv/llamaenv/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 464, in from_pretrained
    return model_class.from_pretrained(
  File "/home/richardwang/venv/llamaenv/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2357, in from_pretrained
    init_contexts = [deepspeed.zero.Init(config_dict_or_path=deepspeed_config())] + init_contexts
  File "/home/richardwang/venv/llamaenv/lib/python3.8/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 657, in __init__
    _ds_config = deepspeed.runtime.config.DeepSpeedConfig(
  File "/home/richardwang/venv/llamaenv/lib/python3.8/site-packages/deepspeed/runtime/config.py", line 808, in __init__
    self._configure_train_batch_size()
  File "/home/richardwang/venv/llamaenv/lib/python3.8/site-packages/deepspeed/runtime/config.py", line 991, in _configure_train_batch_size
    self._batch_assertion()
  File "/home/richardwang/venv/llamaenv/lib/python3.8/site-packages/deepspeed/runtime/config.py", line 926, in _batch_assertion
    assert (
AssertionError: Train batch size: 0 has to be greater than 0

Environment:

torch              1.12.0+cu113
deepspeed          0.8.2
deepspeed-mii      0.0.4
transformers       4.26.1

I have check that both bloom-inference-scripts/bloom-ds-zero-inference.py and inference_server/models/ds_zero.py call the same thing AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16) How can I make it work to start up a server for bloomz with deepspeed inference?

Mar 22 '23 03:03 richarddwang

this is weird. Ill look into this one

Mar 22 '23 05:03 mayank31398

Hi @richarddwang,

I also want to load a local saved bloomz model for inference. However, when I tried to load my local bloomz checkpoint. it often gives me the following error:

Traceback (most recent call last):
  File "bloom-inference-scripts/bloom-ds-zero-inference.py", line 66, in <module>
    tokenizer = AutoTokenizer.from_pretrained(model_name)
  File "/usr/local/lib/python3.7/dist-packages/transformers/models/auto/tokenization_auto.py", line 642, in from_pretrained
    tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/transformers/models/auto/tokenization_auto.py", line 499, in get_tokenizer_config
    _commit_hash=commit_hash,
  File "/usr/local/lib/python3.7/dist-packages/transformers/utils/hub.py", line 420, in cached_file
    local_files_only=local_files_only,
  File "/usr/local/lib/python3.7/dist-packages/huggingface_hub/utils/_validators.py", line 112, in _inner_fn
    validate_repo_id(arg_value)
  File "/usr/local/lib/python3.7/dist-packages/huggingface_hub/utils/_validators.py", line 161, in validate_repo_id
    "Repo id must be in the form 'repo_name' or 'namespace/repo_name':"
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/raid/data/richardwang/bloomz'. Use `repo_type` argument if needed.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2378) of binary: /usr/bin/python3

It seems like the code does not support loading local checkpoints. May I know what change did you make to allow it to load local checkpoints? I have tried neither bloom-inference-scripts/bloom-ds-inference.py nor bloom-inference-scripts/bloom-ds-zero-inference.py works

Thanks a lot:)

May 01 '23 02:05 MichaelCaohn

Hi @richarddwang,

I also want to load a local saved bloomz model for inference. However, when I tried to load my local bloomz checkpoint. it often gives me the following error:

Traceback (most recent call last):
  File "bloom-inference-scripts/bloom-ds-zero-inference.py", line 66, in <module>
    tokenizer = AutoTokenizer.from_pretrained(model_name)
  File "/usr/local/lib/python3.7/dist-packages/transformers/models/auto/tokenization_auto.py", line 642, in from_pretrained
    tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/transformers/models/auto/tokenization_auto.py", line 499, in get_tokenizer_config
    _commit_hash=commit_hash,
  File "/usr/local/lib/python3.7/dist-packages/transformers/utils/hub.py", line 420, in cached_file
    local_files_only=local_files_only,
  File "/usr/local/lib/python3.7/dist-packages/huggingface_hub/utils/_validators.py", line 112, in _inner_fn
    validate_repo_id(arg_value)
  File "/usr/local/lib/python3.7/dist-packages/huggingface_hub/utils/_validators.py", line 161, in validate_repo_id
    "Repo id must be in the form 'repo_name' or 'namespace/repo_name':"
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/raid/data/richardwang/bloomz'. Use `repo_type` argument if needed.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2378) of binary: /usr/bin/python3

It seems like the code does not support loading local checkpoints. May I know what change did you make to allow it to load local checkpoints? I have tried neither bloom-inference-scripts/bloom-ds-inference.py nor bloom-inference-scripts/bloom-ds-zero-inference.py works

Thanks a lot:)

Hello, I successfully used the zero3 stage acceleration for deepspeed inference and used two gpus, but I found that using two gpus did not perform data parallelism or model parallelism. Each apus asked the same question. How should I solve this problem? Thank you for your reply.

Jun 13 '24 01:06 sevenandseven

this repo is no longer being maintained @sevenandseven I suggest using vLLM or TGI

Jun 14 '24 03:06 mayank31398

transformers-bloom-inference
transformers-bloom-inference copied to clipboard

"bloom-ds-zero-inference.py" works but "inference_server.cli --deployment_framework ds_zero" fails

transformers-bloom-inference transformers-bloom-inference copied to clipboard

"bloom-ds-zero-inference.py" works but "inference_server.cli --deployment_framework ds_zero" fails

transformers-bloom-inference
transformers-bloom-inference copied to clipboard