DeepSpeed
DeepSpeed copied to clipboard
[BUG] Incorrect Model Output For Contrastive Search
Describe the bug Currently when using DeepSpeed Inference for GPTJ(but probably other models too) and when using contrastive search with Huggingface the results are very poor. Apparently, the results are very poor for other sampling methods like Beam Search as can be seen in this issue. #2506
To Reproduce Steps to reproduce the behavior:
- Install the latest version of DeepSpeed and Huggingface Transformers
- Load GPTJ with DeepSpeed using FP16
- Attempt to generate any output using contrastive search, this is done by setting do_sample to False, top_k to 4, and penalty_alpha to 0.6
- Note that the output is very poor
- Load GPTJ with just Transformers
- Generate the same output with the same prompt and the same sampling parameters
- Take note that the output is very different and better
Expected behavior I would expect the results to ideally be the exact same, but at the very least slightly different but the same quality
ds_report output
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja ninja .................. [OKAY]
op name ................ installed .. compatible
async_io ............... [YES] ...... [OKAY] cpu_adagrad ............ [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] quantizer .............. [YES] ...... [OKAY] random_ltd ............. [YES] ...... [OKAY] sparse_attn ............ [YES] ...... [OKAY] spatial_inference ...... [YES] ...... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] transformer_inference .. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY]
DeepSpeed general environment info: torch install path ............... ['/usr/local/lib/python3.8/dist-packages/torch'] torch version .................... 1.13.1+cu117 deepspeed install path ........... ['/usr/local/lib/python3.8/dist-packages/deepspeed'] deepspeed info ................... 0.8.0+bf6b9802, bf6b9802, HEAD torch cuda version ............... 11.7 torch hip version ................ None nvcc version ..................... 11.7 deepspeed wheel compiled w. ...... torch 1.13, cuda 11.7
System info (please complete the following information):
- OS: Ubuntu 20.04
- RTX 3090
- Transformers version 4.26.0
- Python version 3.8.10
Docker context I am using a docker image very similar to the one here https://github.com/mallorbc/Finetune_GPTNEO_GPTJ6B
Additional context
This issue has great context for issues with many different sampling methods #2506
I will copy an important comment:
've done some benchmarks using gpt2 with fp16 precision on my own data (of course ymmv).
System info
cuda version 11.7 A10G instance 24G DeepSpeed 0.7.7 Transformers 4.25.1 Python 3.7 Torch 1.13.1 in summary, with and w/o DeepSpeed:
Top-P sampling (top_p = 0.6, temperature = 0.6)
Score ~1% degradation Latency ~2x speedup Beam Search (beam = 3)
Score ~14% degradation (w/ some poor generations mixed in) Latency ~2.5x speedup Contrastive search (top_k = 4, penalty_alpha = 0.6)
Score ~62% degradation Latency ~2.8x speedup (partly due to shorter generations) Eta sampling (eta_cutoff = 0.0005)
Score: 0.05% degradation Latency: ~2.2x speedup So top p and eta sampling work great. Beam search and contrastive search degrade significantly