vik
vik
This should be possible now with the ONNX client implementation. https://github.com/vikhyat/moondream/tree/7a8bbcacd94e9f52f60b06eeeeafadfbd92b123a/moondream/clients/python
Yes - the current version of moondream can detect one object per image. If you query with `Bounding box: {object}` it will return an array of 4 floating point numbers...
For most tasks we see very little benefit from finetuning the vision encoder, and for some tasks we actually see worse performance. Unless the dataset is 100k+ images I would...
Are you looking for image embeddings, or text embeddings?
Currently it only returns gaze if it thinks the gaze point is within the image. There's a setting you can use to change that behavior: https://github.com/vikhyat/moondream/blob/main/moondream/torch/moondream.py#L541-L546 Considering changing it to...
There's a recipe for running it on videos here: https://github.com/vikhyat/moondream/tree/main/recipes/gaze-detection-video
Will be releasing later this week, I will reply here when it’s out. Won’t be delayed this much for future releases, I underestimated how much work porting our attention change...
Hey, you should be able to set up DDP or FSDP for multi-GPU training. The finetuning scripts we provide are barebones simplified versions that people can adapt to their setup,...
> Added a check before "torch._dynamo.mark_dynamic" which prevents it from running on Mac. Why does this need to be blocked?
Would not recommend using the ollama version right now, it only supports a very old version of the model (from April). I need to reach out to them to figure...