DeepSpeed [BUG] Incorrect Model Output For Contrastive Search

[BUG] Incorrect Model Output For Contrastive Search

Open mallorbc opened this issue 2 years ago • 0 comments

Describe the bug Currently when using DeepSpeed Inference for GPTJ(but probably other models too) and when using contrastive search with Huggingface the results are very poor. Apparently, the results are very poor for other sampling methods like Beam Search as can be seen in this issue. #2506

To Reproduce Steps to reproduce the behavior:

Install the latest version of DeepSpeed and Huggingface Transformers
Load GPTJ with DeepSpeed using FP16
Attempt to generate any output using contrastive search, this is done by setting do_sample to False, top_k to 4, and penalty_alpha to 0.6
Note that the output is very poor
Load GPTJ with just Transformers
Generate the same output with the same prompt and the same sampling parameters
Take note that the output is very different and better

Expected behavior I would expect the results to ideally be the exact same, but at the very least slightly different but the same quality

ds_report output

DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja ninja .................. [OKAY]

op name ................ installed .. compatible

async_io ............... [YES] ...... [OKAY] cpu_adagrad ............ [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] quantizer .............. [YES] ...... [OKAY] random_ltd ............. [YES] ...... [OKAY] sparse_attn ............ [YES] ...... [OKAY] spatial_inference ...... [YES] ...... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] transformer_inference .. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY]

DeepSpeed general environment info: torch install path ............... ['/usr/local/lib/python3.8/dist-packages/torch'] torch version .................... 1.13.1+cu117 deepspeed install path ........... ['/usr/local/lib/python3.8/dist-packages/deepspeed'] deepspeed info ................... 0.8.0+bf6b9802, bf6b9802, HEAD torch cuda version ............... 11.7 torch hip version ................ None nvcc version ..................... 11.7 deepspeed wheel compiled w. ...... torch 1.13, cuda 11.7

System info (please complete the following information):

OS: Ubuntu 20.04
RTX 3090
Transformers version 4.26.0
Python version 3.8.10

Docker context I am using a docker image very similar to the one here https://github.com/mallorbc/Finetune_GPTNEO_GPTJ6B

Additional context

This issue has great context for issues with many different sampling methods #2506

I will copy an important comment:

've done some benchmarks using gpt2 with fp16 precision on my own data (of course ymmv).

System info

cuda version 11.7 A10G instance 24G DeepSpeed 0.7.7 Transformers 4.25.1 Python 3.7 Torch 1.13.1 in summary, with and w/o DeepSpeed:

Top-P sampling (top_p = 0.6, temperature = 0.6)

Score ~1% degradation Latency ~2x speedup Beam Search (beam = 3)

Score ~14% degradation (w/ some poor generations mixed in) Latency ~2.5x speedup Contrastive search (top_k = 4, penalty_alpha = 0.6)

Score ~62% degradation Latency ~2.8x speedup (partly due to shorter generations) Eta sampling (eta_cutoff = 0.0005)

Score: 0.05% degradation Latency: ~2.2x speedup So top p and eta sampling work great. Beam search and contrastive search degrade significantly

Feb 10 '23 03:02 mallorbc

DeepSpeed DeepSpeed copied to clipboard

[BUG] Incorrect Model Output For Contrastive Search

DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja ninja .................. [OKAY]

op name ................ installed .. compatible

DeepSpeed
DeepSpeed copied to clipboard