vik
vik
Just added support - here's an example of how it would work! ```python answers = moondream.batch_answer( images=[Image.open(''), Image.open('')], prompts=["Describe this image.", "Are there people in this image?"], tokenizer=tokenizer, ) ```...
Thank you for testing! Closing this, please reopen if you run into any issues.
Would help to get examples of more real world use-cases for this. I'm definitely open to adding support for it, just need to understand what types of training data to...
Just added support for Flash Attention, will be pushing to Hugging Face later today. I also tried out 4/8 bit quantization and while it seems to work (the code runs)...
You will have to pass in `attn_implementation="flash_attention_2"` when instantiating the model, it's not enabled out of the box. The change I had to make was adding a flag telling transformers...
Was able to reproduce, thanks! I have a pretty good idea what's causing this, should have it fixed in a few days.
Should be fixed by today's release! Going to close this, please feel free to reopen or open a new issue if you find any other issues!
Just uploaded a notebook that shows how to fine-tune the model. https://github.com/vikhyat/moondream/blob/main/notebooks/Finetuning.ipynb Feel free to reopen if you run into any issues with the script!
Pretty sure this is the OOM killer kicking in when you run out of RAM. I saw someone mention it needs ~9.7GB of RAM available to run on CPU right...
We have support in llama.cpp now, would recommend using that when running on Raspberry Pi. Can you please try it out and let me know if you run into any...