MiniGPT-4 icon indicating copy to clipboard operation
MiniGPT-4 copied to clipboard

Error with batch inference

Open insundaycathy opened this issue 1 year ago • 3 comments

Assertion error or out of bounds error when doing inference with batch size greater than 1: ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [15,0,0], thread: [64,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [15,0,0], thread: [65,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [15,0,0], thread: [66,0,0] Assertion srcIndex < srcSelectDimSize failed. ... model runs normally when batchsize=1

error occurs in this piece of code: outputs = model.llama_model.generate( inputs_embeds=emb, max_new_tokens=max_new_tokens, stopping_criteria=stopping_criteria, num_beams=num_beams, do_sample=True, min_length=min_length, top_p=top_p, repetition_penalty=repetition_penalty, length_penalty=length_penalty, temperature=temperature)

insundaycathy avatar Jun 09 '23 01:06 insundaycathy

how to inference in batches?

ghost avatar Jun 28 '23 02:06 ghost

the same problem. Do you write codes that support inference with batch?

pixas avatar Aug 31 '23 05:08 pixas

What is your max_new_tokens? I find that for batch inference that value should not be too large. For me 64 works in most cases (but the answer might be truncated)

joslefaure avatar Jan 25 '24 12:01 joslefaure