parallelformers icon indicating copy to clipboard operation
parallelformers copied to clipboard

GPT models hang on large token generation. Lower performance?

Open mallorbc opened this issue 3 years ago • 1 comments

I am using a 3060 and a 3090 to split GPT models two ways including GPTJ and GPT Neo 2.7B. When generating many tokens, say 500, the model hangs and either takes a abnormal amount of time to finish or does not finish. ( I kill it) Generating 50 tokens does not have this issue.
During this issue, the 3090 memory is pinned to 100% while the 3060 stays low.

image

Subjectively, especially for GPTJ, the results, while not complete gibberish seem to be of lower quality.

mallorbc avatar Dec 15 '21 08:12 mallorbc

Might this be a race condition between the two GPUs?

mallorbc avatar Jan 05 '22 22:01 mallorbc