DeepSpeed
DeepSpeed copied to clipboard
[BUG]After using the code that supports llama inference, the result of the inference is different from the original one
Describe the bug A clear and concise description of what the bug is.
To Reproduce Steps to reproduce the behavior:
- Simple inference script to reproduce
- What packages are required and their versions
- How to run the script
- ...
Expected behavior A clear and concise description of what you expected to happen.
ds_report output
Please run ds_report to give us details about your setup.
Screenshots If applicable, add screenshots to help explain your problem.
System info (please complete the following information):
- OS: [e.g. Ubuntu 18.04]
- GPU count and types [e.g. two machines with x8 A100s each]
- (if applicable) what DeepSpeed-MII version are you using
- (if applicable) Hugging Face Transformers/Accelerate/etc. versions
- Python version
- Any other relevant info about your setup
Docker context Are you using a specific docker image that you can share?
Additional context Add any other context about the problem here.
you should share code / necessary links to reproduce.
When I asked "Who is founder of goolge.com?", the result of llama13B answered as shown in the figure below:
“tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro tro accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accur accu....”
It was run on two V100s, and the configuration was as follows
model = deepspeed.init_inference(
model=model,
mp_size=2,
dtype=torch.float16,
replace_method="auto",
replace_with_kernel_inject=True,
)