moondream
moondream copied to clipboard
Better support for GPU and Flash Attention during inference
The inference code provided in this repository forces moondream to run on CPU. We should allow the user to leverage GPUs and Flash Attention for faster inference if they want to.
Added CUDA support in #22