DeepSpeed [BUG] GPT-J Inference on batch size

Describe the bug GPT-J Inference on batch size > 2 crashes with CUDA error: an illegal memory access was encountered

To Reproduce Colab that reproduces this error: https://colab.research.google.com/drive/1VMGpWMUDc4vHMEHL5aJ-4fQ2_w4c8b_Y?usp=sharing

Note that it uses Brendan Dolan-Gavitt's variant of Codegen-350M (converted to GPT-J format). Colab GPUs can't handle the original GPT-J 6B model, so I'm using this. But the same issue arises with GPT-j-6b too (https://twitter.com/abacaj/status/1649889879579344896)

Expected behavior Cuda error should not arise.

ds_report output

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.0
 [WARNING]  using untested triton version (2.0.0), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/usr/local/lib/python3.9/dist-packages/torch']
torch version .................... 2.0.0+cu118
deepspeed install path ........... ['/usr/local/lib/python3.9/dist-packages/deepspeed']
deepspeed info ................... 0.9.1, unknown, unknown
torch cuda version ............... 11.8
torch hip version ................ None
nvcc version ..................... 11.8
deepspeed wheel compiled w. ...... torch 2.0, cuda 11.8

System info (please complete the following information):

Tried this on Google Colab (T4), and 2 RHEL machines, one with a P100 and one with a V100. Python version: 3.9 Deepspeed version: 0.9.1 Pytorch version: 2.0

Docker context No docker.

Apr 22 '23 21:04 thakkarparth007

seems to be working without crashing with !pip install transformers deepspeed==0.9.1

Apr 26 '23 23:04 Mistobaan

The colab link uses the same versions. Does that colab link work for you? I tried this on a couple of systems and was consistently getting this issue. Someone else reproduced this error (please see the linked tweet) as well.

Edit: I'm puzzled, but yes you're right. It does not seem to be crashing anymore. I wonder what changed.

Apr 27 '23 00:04 thakkarparth007

DeepSpeed
DeepSpeed copied to clipboard

[BUG] GPT-J Inference on batch size > 2 crashes

DeepSpeed DeepSpeed copied to clipboard

[BUG] GPT-J Inference on batch size > 2 crashes

DeepSpeed
DeepSpeed copied to clipboard