moondream icon indicating copy to clipboard operation
moondream copied to clipboard

Better support for GPU and Flash Attention during inference

Open vikhyat opened this issue 1 year ago • 1 comments

The inference code provided in this repository forces moondream to run on CPU. We should allow the user to leverage GPUs and Flash Attention for faster inference if they want to.

vikhyat avatar Jan 24 '24 22:01 vikhyat

Added CUDA support in #22

spartanhaden avatar Jan 25 '24 12:01 spartanhaden