transformers-bloom-inference issues

Big batchsize cause OOM in bloom-ds-inference.py, how to adjust max_split_size_mb value

1

OutOfMemoryError: CUDA out of memory. Tried to allocate 62.00 MiB (GPU 6; 79.19 GiB total capacity; 66.51 GiB already allocated; 61.56 MiB free; 67.77 GiB reserved in total by PyTorch)...

tohneecao

"bloom-ds-zero-inference.py" works but "inference_server.cli --deployment_framework ds_zero" fails

2

`deepspeed --num_gpus 4 bloom-inference-scripts/bloom-ds-zero-inference.py --name /raid/data/richardwang/bloomz --cpu_offload` worked and gave me inference output. `/raid/data/richardwang/bloomz` is a downloaded copy of bigscience/[bloomz](https://huggingface.co/bigscience/bloomz/tree/main) However `python -m inference_server.cli --model_name /raid/data/richardwang/bloomz --model_class AutoModelForCausalLM --dtype bf16...

richarddwang

root_dir in TemporaryCheckpointsJSON is redundant

In TemporaryCheckpointsJSON(https://github.com/huggingface/transformers-bloom-inference/blob/main/inference_server/models/ds_inference.py#L80) , ![image](https://user-images.githubusercontent.com/5948851/233960651-2b64a5f8-2d8a-4982-88fe-50852381f635.png) When use `glob.glob(f"{self.model_path}/*.bin")`, files path in the list will all contain `model_path` prefix (eg: modelname is `bigscience/bloom` ). ``` {"type": "BLOOM", "checkpoints": ["bigscience/bloom/pytorch_model.bin"], "version": 1.0} ```...

dc3671

cuBLAS error with NVIDIA H100 HGX, CUDA v12.1, and cuDNN 8.8.1

cuBLAS error when running the HuggingFace `accelerate` following benchmark code on NVIDIA H100 HGX, CUDA v12.1, cuDNN 8.8.1, pytorch==2.0.0+cu118, within Jupyter Notebook: `!CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python transformers-bloom-inference/bloom-inference-scripts/bloom-accelerate-inference.py --name bigscience/bloom --dtype int8 --batch_size...

BenFauber

Incorrectly benchmarking

All 3 scripts under `bloom-inference-scripts` incorrectly benchmark the `t_generate_span` time. The `t_generate_span` is got from the first `generate()` call at here https://github.com/huggingface/transformers-bloom-inference/blob/main/bloom-inference-scripts/bloom-ds-inference.py#L257 instead of in the benchmark cycle.

JoeyTPChou

Cannot explain recurring OOM error

6

Hi there, I am trying to use the int8 quantized model of BLOOM 175B for inference and am closely following the `bloom-accelerate-inference.py` script. I have about 1000 prompts for which...

Remorax

fix checkpoints file list to align with DeepSpeed

1

When use `glob.glob(f"{self.model_path}/*.bin")`, files path in the list will all contain `model_path` prefix. While set it as `root_dir` will not. And it will align to DeepSpeed's loading way ([replace_module.py](https://github.com/microsoft/DeepSpeed/blob/090d49e79fef300046ec0ca22dc3e1bffde74ee1/deepspeed/module_inject/replace_module.py#L567)): ```...

dc3671

The generated results are different when using greedy search during generation

4

Thank you very much for your work. I got a problem when I ran BLOOM-176B on 8*A100. I followed the `README.md` and executed the following command. To be specific, I...

FrostML