LLaVA
LLaVA copied to clipboard
[Question] Is there a way to speed up the inference?
Question
Title says it all. The 13B takes like >10s to finish streaming the answer for an image. That's a little too slow for my use case. Are there any known techniques to get this number down? Feel free to give me pointers, I can look into hacking some solution together myself
@ohharsen Hello there, did you manage to work out some tricks to speed up the inference?