moondream Cannot reproduce Playground demo on the Web vs result on a locally running model

I see a difference in performance between the Playground model and when I run the model (moondream-2b-int8.mf) locally. I use the same prompt 'head'.

Issue 1: I don't even get the same number of detections when I compare the Playground and local model. Playground output:

Locally running model:

Issue 2: There is a slight difference in the size of bounding boxes between the Playground and local model.
Playground output:

Locally running model:

In all cases, the Playground model seems to perform better.

Which model is used in the Playground?
Why is there a difference in size of the bounding boxes?

Feb 20 '25 14:02 jthecodemonk

I'm facing the same issue with the query. I tried all models locally and the results are incorrect while the playground web version and the API work exactly as expected. I would also like to know the model and configuration (if any) used in the playground and API.

Thanks.

Feb 21 '25 08:02 pbtaffi

I also noticed that the results were different, which is why I tried to set up moondream2 with transformers, because the model here: huggingface moondream2 is larger and I had suspected that this might be a reason (maybe higher precision which may give better results?), but I also had to contribute to this issue issue 235 because there were problems getting it running. In the end it worked (thanks to the help of @parsakhaz), I describe my solution in the issue, but it takes a very long time for inference (CPU) and I can‘t tell that results are much better especially compared to API?

In the meantime I use the API again, because there are also very generous daily request rates (thanks to @vikhyat).

Nevertheless, I would also be very interested in how this can be explained and which model is actually behind the API. That would be exciting to know and it would also be good to know whether this model will also be available for a local usage?

Thanks!

Feb 22 '25 18:02 autmoate

I also noticed that the results were different, which is why I tried to set up moondream2 with transformers, because the model here: huggingface moondream2 is larger and I had suspected that this might be a reason correct, this is a larger model & our latest model release - client libraries don't support our latest release but will soon. (maybe higher precision which may give better results?), but I also had to contribute to this issue issue 235 because there were problems getting it running. In the end it worked (thanks to the help of @parsakhaz), I describe my solution in the issue, but it takes a very long time for inference (CPU) and I can‘t tell that results are much better especially compared to API?

no problem - cpu will be slower for the larger transformers based model, especially compared to the quantized onnx models.

In the meantime I use the API again, because there are also very generous daily request rates (thanks to @vikhyat).

nice, let us know if you need your query limit bumped up!

Nevertheless, I would also be very interested in how this can be explained and which model is actually behind the API. That would be exciting to know and it would also be good to know whether this model will also be available for a local usage?

our API has the same version of moondream as the Cloud :) meaning you get access to the latest unreleased version of the model by using our API

Thanks!

Feb 24 '25 23:02 parsakhaz

Many thanks for the reply! @parsakhaz Good to know and sure, longer time for inference on CPU is normal.

Nevertheless, the question remains why the results differ, as @jthecodemonk shows above and some see similar? I would assume that if Cloud/API is the same model as linked here in the repo or available on huggingface/via transformers, then the results should not differ so much when inference is performed locally? Anyways, I can't replicate the playground results locally and don’t get why.

Feb 25 '25 07:02 autmoate

Many thanks for the reply! @parsakhaz Good to know and sure, longer time for inference on CPU is normal.

Nevertheless, the question remains why the results differ, as @jthecodemonk shows above and some see similar? I would assume that if Cloud/API is the same model as linked here in the repo or available on huggingface/via transformers, then the results should not differ so much when inference is performed locally? Anyways, I can't replicate the playground results locally and don’t get why.

Sorry, I could have been more specific - the playground and API models are using a version of Moondream that has not been publicly released yet - but will be soon!

Feb 25 '25 18:02 parsakhaz

@parsakhaz any idea on when it will be publicly released? Thank you.

Mar 06 '25 02:03 pbtaffi

please update when the new one will be publicly released.

Jun 22 '25 18:06 sandeeppIHX