Zaibei Li
Results
2
comments of
Zaibei Li
It is not a bug, as the demo uses fp16 precision while the llava:34b-v1.6 is in 4 bit quantization. Thus, the performance is not comparable.
Sorry i couldn't confirm if it is bf16 or fp16. But based on the recommended VRAM from [llava](https://github.com/haotian-liu/LLaVA) - 80G, versus 69G in [Ollama](https://ollama.com/library/llava:34b-v1.6-fp16), it might be bf16.