LLaVA icon indicating copy to clipboard operation
LLaVA copied to clipboard

[Question] Is there a way to speed up the inference?

Open ohharsen opened this issue 2 years ago • 1 comments

Question

Title says it all. The 13B takes like >10s to finish streaming the answer for an image. That's a little too slow for my use case. Are there any known techniques to get this number down? Feel free to give me pointers, I can look into hacking some solution together myself

ohharsen avatar Sep 18 '23 17:09 ohharsen

@ohharsen Hello there, did you manage to work out some tricks to speed up the inference?

zjysteven avatar May 12 '24 20:05 zjysteven