MiniGPT-4
MiniGPT-4 copied to clipboard
Error with batch inference
Assertion error or out of bounds error when doing inference with batch size greater than 1:
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [15,0,0], thread: [64,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [15,0,0], thread: [65,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [15,0,0], thread: [66,0,0] Assertion srcIndex < srcSelectDimSize
failed.
...
model runs normally when batchsize=1
error occurs in this piece of code: outputs = model.llama_model.generate( inputs_embeds=emb, max_new_tokens=max_new_tokens, stopping_criteria=stopping_criteria, num_beams=num_beams, do_sample=True, min_length=min_length, top_p=top_p, repetition_penalty=repetition_penalty, length_penalty=length_penalty, temperature=temperature)
how to inference in batches?
the same problem. Do you write codes that support inference with batch?
What is your max_new_tokens
? I find that for batch inference that value should not be too large. For me 64 works in most cases (but the answer might be truncated)