moondream Any known issue with 2025-01-09 on CPU using transformers getting stuck during image encoding?

The example transformer code on the README.md stalls on two different machines. I'm using the example code as is, the only change is the path to the image (1092x1040, png).

I stopped the execution after 8 minutes. The trace shows that the execution was in encode_image() > _run_vision_encoder() > vision_encoder() > mlp() > linear().

Is anyone able to confirm whether that code works on CPU or if this is a known issues.

Previous version of the model (2024-08-26), using associated transformer code works well on the same input image, using the exact same python environment, on the same machine. It completes in 23s.

System

Ubuntu 24.10
i7-1260P
32G RAM
python 3.12.7

pip install transformers torch einops pillow pyvips pyvips-binary torchvision

certifi==2024.12.14
cffi==1.17.1
charset-normalizer==3.4.1
einops==0.8.0
filelock==3.17.0
fsspec==2024.12.0
huggingface-hub==0.28.0
idna==3.10
Jinja2==3.1.5
MarkupSafe==3.0.2
mpmath==1.3.0
networkx==3.4.2
numpy==2.2.2
nvidia-cublas-cu12==12.4.5.8
nvidia-cuda-cupti-cu12==12.4.127
nvidia-cuda-nvrtc-cu12==12.4.127
nvidia-cuda-runtime-cu12==12.4.127
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.2.1.3
nvidia-curand-cu12==10.3.5.147
nvidia-cusolver-cu12==11.6.1.9
nvidia-cusparse-cu12==12.3.1.170
nvidia-nccl-cu12==2.21.5
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.4.127
packaging==24.2
pillow==11.1.0
pycparser==2.22
pyvips==2.2.3
pyvips-binary==8.16.0
PyYAML==6.0.2
regex==2024.11.6
requests==2.32.3
safetensors==0.5.2
setuptools==75.8.0
sympy==1.13.1
tokenizers==0.21.0
torch==2.5.1
torchvision==0.20.1
tqdm==4.67.1
transformers==4.48.1
triton==3.1.0
typing_extensions==4.12.2
urllib3==2.3.0

Jan 28 '25 21:01 geoffroy-noel-ddh

I confirm that I encountered similar issues with rev. 2025-01-09 on different machines

Win10 with py3.11.5 besides pip also tried with conda
tried the same on Win11 with py3.12.7 (pip and conda)
tried WSL and Ubuntu
rev. 2024-08-26 works well

This is also discussed here: https://huggingface.co/vikhyatk/moondream2/discussions/53 and here: https://huggingface.co/vikhyatk/moondream2/discussions/59

Jan 30 '25 08:01 autmoate

@autmoate @geoffroy-noel-ddh Sorry to hear that you're running into issues. Windows requires some additional steps w/ the latest revision - do you have FFMPEG and Pyvips installed to your machine? Detailed steps on getting the latest revision running are available here.

Jan 31 '25 19:01 parsakhaz

Thanks a Lot for your reply and yes I was aware of the pyvips and pyvips-binary install. But I‘ll follow again the instructions you provided and I‘ll have a look at ffmpeg. Will give it another try next week hopefully. Thanks for your hints and help. 🤞

Jan 31 '25 22:01 autmoate

@autmoate @geoffroy-noel-ddh Sorry to hear that you're running into issues. Windows requires some additional steps w/ the latest revision - do you have FFMPEG and Pyvips installed to your machine? Detailed steps on getting the latest revision running are available here.

Hi, you can find all the answers in my description at the top. I'm using Ubuntu, pyvips and pyvips-binary.

ffmpeg is also installed on the machine. Although I did not see in the documentation you've linked or the readme any mention of ffmpeg. So it's not clear whether that's a moondream requirement.

Feb 01 '25 15:02 geoffroy-noel-ddh

@parsakhaz Does the example code at the bottom of the README work for you? Have you tried to reproduce it? If it works, can you tell what step exactly is missing from my description above as I believe it follows the installation instructions?

Feb 01 '25 15:02 geoffroy-noel-ddh

@geoffroy-noel-ddh @parsakhaz I tried again and I'll share what I experienced:

Starting ubuntu22.04: With a fresh ubuntu cloud instance with Python 3.10.12, followed this instructions regarding all dependencies and for linux and at first I ran into issues again:

python3 testing-moondream.py
config.json: 100%|█████████████████████████████████████████████████████████████████████| 276/276 [00:00<00:00, 1.82MB/s]
hf_moondream.py: 100%|█████████████████████████████████████████████████████████████| 3.51k/3.51k [00:00<00:00, 23.0MB/s]
vision.py: 100%|███████████████████████████████████████████████████████████████████| 4.73k/4.73k [00:00<00:00, 22.4MB/s]
config.py: 100%|███████████████████████████████████████████████████████████████████| 2.38k/2.38k [00:00<00:00, 15.7MB/s]
layers.py: 100%|███████████████████████████████████████████████████████████████████| 1.37k/1.37k [00:00<00:00, 9.11MB/s]
image_crops.py: 100%|██████████████████████████████████████████████████████████████| 7.53k/7.53k [00:00<00:00, 36.1MB/s]
utils.py: 100%|████████████████████████████████████████████████████████████████████| 1.42k/1.42k [00:00<00:00, 9.13MB/s]
moondream.py: 100%|████████████████████████████████████████████████████████████████| 21.3k/21.3k [00:00<00:00, 70.2MB/s]
region.py: 100%|███████████████████████████████████████████████████████████████████| 2.82k/2.82k [00:00<00:00, 16.3MB/s]
weights.py: 100%|██████████████████████████████████████████████████████████████████| 9.71k/9.71k [00:00<00:00, 36.2MB/s]
text.py: 100%|█████████████████████████████████████████████████████████████████████| 5.31k/5.31k [00:00<00:00, 25.9MB/s]
Could not locate the rope.py inside vikhyatk/moondream2.
Traceback (most recent call last):
  File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 534, in _make_request
    response = conn.getresponse()
  File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/urllib3/connection.py", line 516, in getresponse
    httplib_response = super().getresponse()
  File "/usr/lib/python3.10/http/client.py", line 1375, in getresponse
    response.begin()
  File "/usr/lib/python3.10/http/client.py", line 318, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.10/http/client.py", line 279, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.10/socket.py", line 705, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3.10/ssl.py", line 1303, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib/python3.10/ssl.py", line 1159, in read
    return self._sslobj.read(len, buffer)
TimeoutError: The read operation timed out

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/requests/adapters.py", line 667, in send
    resp = conn.urlopen(
  File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 841, in urlopen
    retries = retries.increment(
  File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/urllib3/util/retry.py", line 474, in increment
    raise reraise(type(error), error, _stacktrace)
  File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/urllib3/util/util.py", line 39, in reraise
    raise value
  File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 787, in urlopen
    response = self._make_request(
  File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 536, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 367, in _raise_timeout
    raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/python-projects/moondream/testing-moondream.py", line 5, in <module>
    model = AutoModelForCausalLM.from_pretrained(
  File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 526, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(
  File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1063, in from_pretrained
    config_class = get_class_from_dynamic_module(
  File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 541, in get_class_from_dynamic_module
    final_module = get_cached_module_file(
  File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 404, in get_cached_module_file
    get_cached_module_file(
  File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 404, in get_cached_module_file
    get_cached_module_file(
  File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 404, in get_cached_module_file
    get_cached_module_file(
  File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 345, in get_cached_module_file
    resolved_module_file = cached_file(
  File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/transformers/utils/hub.py", line 403, in cached_file
    resolved_file = hf_hub_download(
  File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 860, in hf_hub_download
    return _hf_hub_download_to_cache_dir(
  File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1009, in _hf_hub_download_to_cache_dir
    _download_to_tmp_and_move(
  File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1543, in _download_to_tmp_and_move
    http_get(
  File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 369, in http_get
    r = _request_wrapper(
  File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 301, in _request_wrapper
    response = get_session().request(method=method, url=url, **params)
  File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 93, in send
    return super().send(request, *args, **kwargs)
  File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/requests/adapters.py", line 713, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: (ReadTimeoutError("HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)"), '(Request ID: be11ada0-42e0-44f7-a395-115f618f1a7f)')

Retrying brought up this:

python3 testing-moondream.py
Traceback (most recent call last):
  File "/home/ubuntu/python-projects/moondream/testing-moondream.py", line 5, in <module>
    model = AutoModelForCausalLM.from_pretrained(
  File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 526, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(
  File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1063, in from_pretrained
    config_class = get_class_from_dynamic_module(
  File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 553, in get_class_from_dynamic_module
    return get_class_in_module(class_name, final_module, force_reload=force_download)
  File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 238, in get_class_in_module
    module_files: List[Path] = [module_file] + sorted(map(Path, get_relative_import_files(module_file)))
  File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 128, in get_relative_import_files
    new_imports.extend(get_relative_imports(f))
  File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 97, in get_relative_imports
    with open(module_file, "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/ubuntu/.cache/huggingface/modules/transformers_modules/vikhyatk/moondream2/adcbcd1a6d27fc19974b18dc128eb51ef6837879/rope.py'

So I switched revision to: revision="main" And this worked for me!

Here is my code which is mostly the example code from the shared moondream docs link:

from transformers import AutoModelForCausalLM, AutoTokenizer
from PIL import Image
import time
 
# Initialize the model
model = AutoModelForCausalLM.from_pretrained(
    "vikhyatk/moondream2",
    revision="main",
    trust_remote_code=True,
    # Uncomment for GPU acceleration & pip install accelerate
    # device_map={"": "cuda"}
)
 
# Load your image
image = Image.open("images\test.jpg") # an image showing different vegetables

start = time.time() 
# 2. Visual Question Answering
print("\nAsking questions about the image:")
print(model.query(image, "List all foods in the image in json format!")["answer"])
end = time.time()
elapsed_time = end - start
print(elapsed_time)

Inference took nearly forever which is a bit of a bummer (~500s on a 6 core instance with 18GB of RAM) but yeah, no wonder. Cheap cpu-only cloud instance, I know. I hoped that it runs faster on cpu so I wouldn't have to go for gpu but accelerating cpu-inference breaks this issue I guess. Thanks anyways for the help so far again!

Feb 11 '25 13:02 autmoate

And now for testing Win11 again (setup as mentioned above):

I installed ffmpeg and checked it:

ffmpeg -version
ffmpeg version 7.1-full_build-www.gyan.dev Copyright (c) 2000-2024 the FFmpeg developers
built with gcc 14.2.0 (Rev1, Built by MSYS2 project)
configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-libsnappy --enable-zlib --enable-librist --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-libbluray --enable-libcaca --enable-sdl2 --enable-libaribb24 --enable-libaribcaption --enable-libdav1d --enable-libdavs2 --enable-libopenjpeg --enable-libquirc --enable-libuavs3d --enable-libxevd --enable-libzvbi --enable-libqrencode --enable-librav1e --enable-libsvtav1 --enable-libvvenc --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxeve --enable-libxvid --enable-libaom --enable-libjxl --enable-libvpx --enable-mediafoundation --enable-libass --enable-frei0r --enable-libfreetype --enable-libfribidi --enable-libharfbuzz --enable-liblensfun --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-dxva2 --enable-d3d11va --enable-d3d12va --enable-ffnvcodec --enable-libvpl --enable-nvdec --enable-nvenc --enable-vaapi --enable-libshaderc --enable-vulkan --enable-libplacebo --enable-opencl --enable-libcdio --enable-libgme --enable-libmodplug --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libshine --enable-libtheora --enable-libtwolame --enable-libvo-amrwbenc --enable-libcodec2 --enable-libilbc --enable-libgsm --enable-liblc3 --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-ladspa --enable-libbs2b --enable-libflite --enable-libmysofa --enable-librubberband --enable-libsoxr --enable-chromaprint
libavutil      59. 39.100 / 59. 39.100
libavcodec     61. 19.100 / 61. 19.100
libavformat    61.  7.100 / 61.  7.100
libavdevice    61.  3.100 / 61.  3.100
libavfilter    10.  4.100 / 10.  4.100
libswscale      8.  3.100 /  8.  3.100
libswresample   5.  3.100 /  5.  3.100
libpostproc    58.  3.100 / 58.  3.100

This wasn't the trick. So I followed the instructions on moondream-docs checking all other dependencies for Win and extracted the vips DLLs from /bin to project root. That didn't help either. When checking the task-manager I can see that the code utilizes ~1GB of RAM but CPU doesn't do anything. I guess the RAM represents the image encoding but moondream model isn't loaded.

So last thing I tested from my experiences with the ubuntu instance was to again change revision="2025-01-09" to revision="main". But that wasn't the solution either. Finally, I tried system environment variables for the DLLs but no, no chance. Neither the model is loaded nor does anything else happen. Strange & a pity.

~~Is there anything else that could be the problem here? Any other ideas?~~

Edit: I finally solved it under Win11 (py3.12.7) with ffmpeg7.1_full and vips_DLLs (see moondream docs): I noticed that torchvision wasn't installed so I gave it a try besides it wasn't mentioned in the dependencies. And lo and behold, it works!! For me it was torch-2.6.0 and torchvision-0.21.0

See details here:

pip show torchvision
WARNING: Package(s) not found: torchvision

pip install torchvision
Collecting torchvision
  Using cached torchvision-0.21.0-cp312-cp312-win_amd64.whl.metadata (6.3 kB)
Requirement already satisfied: numpy in c:\users\welz\documents\living_lab\_demobau\softwareentwicklung\moondream\.venv-250128\lib\site-packages (from torchvision) (2.2.2)
Collecting torch==2.6.0 (from torchvision)
...
Using cached torchvision-0.21.0-cp312-cp312-win_amd64.whl (1.6 MB)
Using cached torch-2.6.0-cp312-cp312-win_amd64.whl (204.1 MB)
Installing collected packages: torch, torchvision
  Attempting uninstall: torch
    Found existing installation: torch 2.5.1
    Uninstalling torch-2.5.1:
      Successfully uninstalled torch-2.5.1
Successfully installed torch-2.6.0 torchvision-0.21.0

Here is the code:

from transformers import AutoModelForCausalLM, AutoTokenizer
from PIL import Image
import time

model = AutoModelForCausalLM.from_pretrained(
    "vikhyatk/moondream2",
    revision="main",
    trust_remote_code=True,
    # Uncomment to run on GPU.
    # device_map={"": "cuda"}
)
# print(model)
print("Model loaded")

image_path = "vegetables_test.jpg"
image = Image.open(image_path)

start = time.time()
enc_image = model.encode_image(image)
print(model.query(image, "List all foods in json format")["answer"])
end = time.time()
elapsed_time = end - start
print(f"Elapsed time: {elapsed_time:.2f} seconds")

Unfortunately also ~220s for inference. Will check with preloaded model.

Thanks for all the help again! I'd recommend a troubleshooting section somewhere maybe?

Feb 11 '25 14:02 autmoate

I'm also facing the same issue, using 4vCPUs, 16GB RAM, CodeSpace VM. I replicated the exact setup and steps mentioned in the recipe for "gaze detection". the model is running successfully, but it is very slow, taking more than one minute to process a single frame. Resource usage is also normal, memory usage: 30%, CPU usage: 25%.

@vikhyat

Feb 22 '25 17:02 munna0912

Me too. Not working locally using transformers no matter how many errors i try to fix.

Mar 05 '25 18:03 Q3-3