transformers-bloom-inference icon indicating copy to clipboard operation
transformers-bloom-inference copied to clipboard

Fast Inference Solutions for BLOOM

Results 24 transformers-bloom-inference issues
Sort by recently updated
recently updated
newest added

I am trying to create a simple chatbot using bloom-7b1 model (may use bigger models later) based on bloom-ds-zero-inference.py. Here is my code: ```import json import os from pathlib import...

In the readme file, the following command is provided to install required dependencies. `pip install flask flask_api gunicorn pydantic accelerate huggingface_hub>=0.9.0 deepspeed>=0.7.3 deepspeed-mii==0.0.2` But when executing this command in shell,...

(gh_transformers-bloom-inference) amd00@MZ32-00:~/llm_dev/transformers-bloom-inference$ python bloom-inference-scripts/bloom-accelerate-inference.py --name ~/hf_model/bloom --batch_size 1 --benchmark Using 0 gpus Loading model /home/amd00/hf_model/bloom Traceback (most recent call last): File "/home/amd00/llm_dev/transformers-bloom-inference/bloom-inference-scripts/bloom-accelerate-inference.py", line 49, in tokenizer = AutoTokenizer.from_pretrained(model_name) File "/home/amd00/anaconda3/envs/gh_transformers-bloom-inference/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py",...

I am using multi-gpu to quantize the model and inference with deepspeed==0.9.0, but failed. Device: RTX-3090 x 8 Server Docker: [nvidia-pytorch-container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch/tags) which tag is `22.07-py3`. Then git clone this codebase...

When using Falcon-40B with 'bloom-accelerate-inference.py' I am getting first the error that "ValueError: The following model_kwargs are not used by the model: ['token_type_ids'] (note: typos in the generate arguments will...

Why is it said that only ds_zero is currently doing world_size streams on world_size gpus, while acclerate and ds inference should be doing the same as well since they also...

After executing make gen-proto and make bloom-560m, I observed that the generated text is related to the input text. Is it possible to modify it to have a conversational style?...

When running the command "make bloom-560m" on CentOS, it executed successfully. However, when you entered text in the browser and clicked the submit button, it kept displaying the message "Processing"...

I was test the deepspeed bloom inference with bloom-7b1 using 3 GPUs. This error didn't occur when I ran the accelerate inference. command line: ``` deepspeed --include localhost:2,5,6 --module inference_server.benchmark...

Where can I download bloom-7b? I noticed that int8 quantization is available, but is there an option for int4 quantization? What is the memory overhead for int4 and int8 when...