transformers
transformers copied to clipboard
Using Bloom with int8 generate unreadable outputs
System Info
transformersversion: 4.26.1- Platform: Linux-4.19.91-009.ali4000.alios7.x86_64-x86_64-with-glibc2.27
- Python version: 3.9.16
- Huggingface_hub version: 0.12.1
- PyTorch version (GPU?): 1.12.0+cu113 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?:
- Using distributed or parallel set-up in script?:
When I use the int8 type of bloom to generate outputs on 8*Tesla V100(32GB), I find all of the tokens generated by the model are "unk". Are their any ideas to help me solve this problem? This phenomenon doesn't appear in the bloom-7b1 model.
Who can help?
@sgugger @muellerzr
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [X] My own task or dataset (give details below)
Reproduction
My code is here.
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "model_path"
max_memory_mapping = {0: "25GB", 1: "25GB", 2: "25GB", 3: "25GB", 4: "25GB", 5: "25GB", 6: "25GB", 7: "25GB"}
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto", max_memory=max_memory_mapping, load_in_8bit=True)
inputs = tokenizer.encode('''Hello ''', return_tensors="pt").to("cuda")
outputs = model.generate(inputs, max_new_tokens=10)
print(tokenizer.decode(outputs[0]))
And the output is "Hello unk unk unk unk unk unk unk unk unk unk "
Expected behavior
I expect the model outputs some meaningful results, such as "Hello, I am a young woman of 28 years old who has just arrived in New Braunfels for" from the API in the https://huggingface.co/bigscience/bloom?text=Hello or "Hello I am a newbie in python and I am" -- use the "bloom-7b1' model (int8) inference on a single Tesla V100
cc @younesbelkada
the V100 series were not supported by bitsandbytes but now they should be compatible since the 0.37.0 relase. What is your bitsandbytes version? Can you try to update bitsandbytes ? pip install --upgrade bitsandbytes
the V100 series were not supported by
bitsandbytesbut now they should be compatible since the0.37.0relase. What is yourbitsandbytesversion? Can you try to updatebitsandbytes?pip install --upgrade bitsandbytes
I have used their latest version 0.37.0, and the int8 type of "bloom-7b1" seems work well on a single Tesla V100, albeit it have repetitions at the end of the outputs.
@SAI990323
Are you still facing the issue? Can you try an approach that is similar to: https://github.com/huggingface/transformers/issues/21987#issuecomment-1458231709 and let us know if this works?
Also make sure to use bitsandbytes==0.37.1
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.