transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Using Bloom with int8 generate unreadable outputs

Open SAI990323 opened this issue 2 years ago • 5 comments

System Info

  • transformers version: 4.26.1
  • Platform: Linux-4.19.91-009.ali4000.alios7.x86_64-x86_64-with-glibc2.27
  • Python version: 3.9.16
  • Huggingface_hub version: 0.12.1
  • PyTorch version (GPU?): 1.12.0+cu113 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

When I use the int8 type of bloom to generate outputs on 8*Tesla V100(32GB), I find all of the tokens generated by the model are "unk". Are their any ideas to help me solve this problem? This phenomenon doesn't appear in the bloom-7b1 model.

Who can help?

@sgugger @muellerzr

Information

  • [X] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [X] My own task or dataset (give details below)

Reproduction

My code is here. from transformers import AutoModelForCausalLM, AutoTokenizer checkpoint = "model_path" max_memory_mapping = {0: "25GB", 1: "25GB", 2: "25GB", 3: "25GB", 4: "25GB", 5: "25GB", 6: "25GB", 7: "25GB"} tokenizer = AutoTokenizer.from_pretrained(checkpoint) model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto", max_memory=max_memory_mapping, load_in_8bit=True) inputs = tokenizer.encode('''Hello ''', return_tensors="pt").to("cuda") outputs = model.generate(inputs, max_new_tokens=10) print(tokenizer.decode(outputs[0]))

And the output is "Hello unk unk unk unk unk unk unk unk unk unk "

Expected behavior

I expect the model outputs some meaningful results, such as "Hello, I am a young woman of 28 years old who has just arrived in New Braunfels for" from the API in the https://huggingface.co/bigscience/bloom?text=Hello or "Hello I am a newbie in python and I am" -- use the "bloom-7b1' model (int8) inference on a single Tesla V100

SAI990323 avatar Feb 26 '23 16:02 SAI990323

cc @younesbelkada

sgugger avatar Feb 27 '23 07:02 sgugger

the V100 series were not supported by bitsandbytes but now they should be compatible since the 0.37.0 relase. What is your bitsandbytes version? Can you try to update bitsandbytes ? pip install --upgrade bitsandbytes

younesbelkada avatar Feb 27 '23 08:02 younesbelkada

the V100 series were not supported by bitsandbytes but now they should be compatible since the 0.37.0 relase. What is your bitsandbytes version? Can you try to update bitsandbytes ? pip install --upgrade bitsandbytes

I have used their latest version 0.37.0, and the int8 type of "bloom-7b1" seems work well on a single Tesla V100, albeit it have repetitions at the end of the outputs.

SAI990323 avatar Feb 27 '23 11:02 SAI990323

@SAI990323 Are you still facing the issue? Can you try an approach that is similar to: https://github.com/huggingface/transformers/issues/21987#issuecomment-1458231709 and let us know if this works? Also make sure to use bitsandbytes==0.37.1

younesbelkada avatar Mar 29 '23 15:03 younesbelkada

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Apr 23 '23 15:04 github-actions[bot]