gpt-fast
gpt-fast copied to clipboard
Too long input texts cuase device-side assert triggered
<frozen importlib._bootstrap_external>:843: _call_with_frames_removed: block: [8,0,0], thread: [58,0,0] Assertion `index out of bounds: 0 <= tmp68 < 1504` failed.
<frozen importlib._bootstrap_external>:843: _call_with_frames_removed: block: [8,0,0], thread: [59,0,0] Assertion `index out of bounds: 0 <= tmp68 < 1504` failed.
<frozen importlib._bootstrap_external>:843: _call_with_frames_removed: block: [8,0,0], thread: [60,0,0] Assertion `index out of bounds: 0 <= tmp68 < 1504` failed.
<frozen importlib._bootstrap_external>:843: _call_with_frames_removed: block: [8,0,0], thread: [61,0,0] Assertion `index out of bounds: 0 <= tmp68 < 1504` failed.
<frozen importlib._bootstrap_external>:843: _call_with_frames_removed: block: [8,0,0], thread: [62,0,0] Assertion `index out of bounds: 0 <= tmp68 < 1504` failed.
<frozen importlib._bootstrap_external>:843: _call_with_frames_removed: block: [8,0,0], thread: [63,0,0] Assertion `index out of bounds: 0 <= tmp68 < 1504` failed.
...
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
When I use llama-2-7b-hf and input ten samples, each with a length of around 2048 after tokenization, I still encounter the above error, even though I have set the block size to 4096.
If I shorten the length, it works.
Any idea on how to use this for long context example? seems max_seq_len >2048 triggered above error