transformers-bloom-inference The generated results are different when using greedy search during generation

The generated results are different when using greedy search during generation

Open FrostML opened this issue 2 years ago • 4 comments

Thank you very much for your work. I got a problem when I ran BLOOM-176B on 8*A100.

I followed the README.md and executed the following command. To be specific, I set do_sample = true and top_k = 1 which I thought it was equivalent to greedy search:

python -m inference_server.cli --model_name bigscience/bloom --model_class AutoModelForCausalLM --dtype bf16 --deployment_framework hf_accelerate --generate_kwargs '{"min_length": 100, "max_new_tokens": 100, "do_sample": true, "top_k": 1}'

However, the generated outputs of several forwards were different with the same inputs. This situation happened occasionally.

Do you have any clues or ideas about this?

My env info:

CUDA 11.7
nccl 2.14.3

accelerate 0.17.1
Flask 2.2.3
Flask-API 3.0.post1
gunicorn 20.1.0
pydantic 1.10.6
huggingface-hub 0.13.2

Mar 14 '23 13:03 FrostML

Hi, do_sample = true and top_k = 1 should be fine but the correct way to do it is just do_sample = False. This is weird. I don't this is a bug in the code in this repository. But will try to give it a shot. Can you try with just do_sample = False?

Mar 14 '23 15:03 mayank31398

Hi @mayank31398 Sorry for the late reply. It was ok with do_sample=False. The results were all the same. But I still can't figure out why sampling can't work properly. Do you know who or which repo I can turn to for some help?

Mar 20 '23 12:03 FrostML

Refer to https://huggingface.co/blog/how-to-generate. Because sampling is designed to incorporate randomness into picking the next word.

Mar 22 '23 03:03 richarddwang

But the k is 1. There shouldn't be any randomness. @richarddwang

Mar 22 '23 03:03 FrostML

transformers-bloom-inference transformers-bloom-inference copied to clipboard

The generated results are different when using greedy search during generation

transformers-bloom-inference
transformers-bloom-inference copied to clipboard