Any known issue with 2025-01-09 on CPU using transformers getting stuck during image encoding?
The example transformer code on the README.md stalls on two different machines. I'm using the example code as is, the only change is the path to the image (1092x1040, png).
I stopped the execution after 8 minutes. The trace shows that the execution was in encode_image() > _run_vision_encoder() > vision_encoder() > mlp() > linear().
Is anyone able to confirm whether that code works on CPU or if this is a known issues.
Previous version of the model (2024-08-26), using associated transformer code works well on the same input image, using the exact same python environment, on the same machine. It completes in 23s.
System
- Ubuntu 24.10
- i7-1260P
- 32G RAM
- python 3.12.7
pip install transformers torch einops pillow pyvips pyvips-binary torchvision
certifi==2024.12.14
cffi==1.17.1
charset-normalizer==3.4.1
einops==0.8.0
filelock==3.17.0
fsspec==2024.12.0
huggingface-hub==0.28.0
idna==3.10
Jinja2==3.1.5
MarkupSafe==3.0.2
mpmath==1.3.0
networkx==3.4.2
numpy==2.2.2
nvidia-cublas-cu12==12.4.5.8
nvidia-cuda-cupti-cu12==12.4.127
nvidia-cuda-nvrtc-cu12==12.4.127
nvidia-cuda-runtime-cu12==12.4.127
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.2.1.3
nvidia-curand-cu12==10.3.5.147
nvidia-cusolver-cu12==11.6.1.9
nvidia-cusparse-cu12==12.3.1.170
nvidia-nccl-cu12==2.21.5
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.4.127
packaging==24.2
pillow==11.1.0
pycparser==2.22
pyvips==2.2.3
pyvips-binary==8.16.0
PyYAML==6.0.2
regex==2024.11.6
requests==2.32.3
safetensors==0.5.2
setuptools==75.8.0
sympy==1.13.1
tokenizers==0.21.0
torch==2.5.1
torchvision==0.20.1
tqdm==4.67.1
transformers==4.48.1
triton==3.1.0
typing_extensions==4.12.2
urllib3==2.3.0
I confirm that I encountered similar issues with rev. 2025-01-09 on different machines
- Win10 with py3.11.5 besides pip also tried with conda
- tried the same on Win11 with py3.12.7 (pip and conda)
- tried WSL and Ubuntu
- rev. 2024-08-26 works well
This is also discussed here: https://huggingface.co/vikhyatk/moondream2/discussions/53 and here: https://huggingface.co/vikhyatk/moondream2/discussions/59
@autmoate @geoffroy-noel-ddh Sorry to hear that you're running into issues. Windows requires some additional steps w/ the latest revision - do you have FFMPEG and Pyvips installed to your machine? Detailed steps on getting the latest revision running are available here.
Thanks a Lot for your reply and yes I was aware of the pyvips and pyvips-binary install. But I‘ll follow again the instructions you provided and I‘ll have a look at ffmpeg. Will give it another try next week hopefully. Thanks for your hints and help. 🤞
@autmoate @geoffroy-noel-ddh Sorry to hear that you're running into issues. Windows requires some additional steps w/ the latest revision - do you have FFMPEG and Pyvips installed to your machine? Detailed steps on getting the latest revision running are available here.
Hi, you can find all the answers in my description at the top. I'm using Ubuntu, pyvips and pyvips-binary.
ffmpeg is also installed on the machine. Although I did not see in the documentation you've linked or the readme any mention of ffmpeg. So it's not clear whether that's a moondream requirement.
@parsakhaz Does the example code at the bottom of the README work for you? Have you tried to reproduce it? If it works, can you tell what step exactly is missing from my description above as I believe it follows the installation instructions?
@geoffroy-noel-ddh @parsakhaz I tried again and I'll share what I experienced:
Starting ubuntu22.04: With a fresh ubuntu cloud instance with Python 3.10.12, followed this instructions regarding all dependencies and for linux and at first I ran into issues again:
python3 testing-moondream.py
config.json: 100%|█████████████████████████████████████████████████████████████████████| 276/276 [00:00<00:00, 1.82MB/s]
hf_moondream.py: 100%|█████████████████████████████████████████████████████████████| 3.51k/3.51k [00:00<00:00, 23.0MB/s]
vision.py: 100%|███████████████████████████████████████████████████████████████████| 4.73k/4.73k [00:00<00:00, 22.4MB/s]
config.py: 100%|███████████████████████████████████████████████████████████████████| 2.38k/2.38k [00:00<00:00, 15.7MB/s]
layers.py: 100%|███████████████████████████████████████████████████████████████████| 1.37k/1.37k [00:00<00:00, 9.11MB/s]
image_crops.py: 100%|██████████████████████████████████████████████████████████████| 7.53k/7.53k [00:00<00:00, 36.1MB/s]
utils.py: 100%|████████████████████████████████████████████████████████████████████| 1.42k/1.42k [00:00<00:00, 9.13MB/s]
moondream.py: 100%|████████████████████████████████████████████████████████████████| 21.3k/21.3k [00:00<00:00, 70.2MB/s]
region.py: 100%|███████████████████████████████████████████████████████████████████| 2.82k/2.82k [00:00<00:00, 16.3MB/s]
weights.py: 100%|██████████████████████████████████████████████████████████████████| 9.71k/9.71k [00:00<00:00, 36.2MB/s]
text.py: 100%|█████████████████████████████████████████████████████████████████████| 5.31k/5.31k [00:00<00:00, 25.9MB/s]
Could not locate the rope.py inside vikhyatk/moondream2.
Traceback (most recent call last):
File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 534, in _make_request
response = conn.getresponse()
File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/urllib3/connection.py", line 516, in getresponse
httplib_response = super().getresponse()
File "/usr/lib/python3.10/http/client.py", line 1375, in getresponse
response.begin()
File "/usr/lib/python3.10/http/client.py", line 318, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.10/http/client.py", line 279, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/usr/lib/python3.10/socket.py", line 705, in readinto
return self._sock.recv_into(b)
File "/usr/lib/python3.10/ssl.py", line 1303, in recv_into
return self.read(nbytes, buffer)
File "/usr/lib/python3.10/ssl.py", line 1159, in read
return self._sslobj.read(len, buffer)
TimeoutError: The read operation timed out
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/requests/adapters.py", line 667, in send
resp = conn.urlopen(
File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 841, in urlopen
retries = retries.increment(
File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/urllib3/util/retry.py", line 474, in increment
raise reraise(type(error), error, _stacktrace)
File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/urllib3/util/util.py", line 39, in reraise
raise value
File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 787, in urlopen
response = self._make_request(
File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 536, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 367, in _raise_timeout
raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ubuntu/python-projects/moondream/testing-moondream.py", line 5, in <module>
model = AutoModelForCausalLM.from_pretrained(
File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 526, in from_pretrained
config, kwargs = AutoConfig.from_pretrained(
File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1063, in from_pretrained
config_class = get_class_from_dynamic_module(
File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 541, in get_class_from_dynamic_module
final_module = get_cached_module_file(
File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 404, in get_cached_module_file
get_cached_module_file(
File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 404, in get_cached_module_file
get_cached_module_file(
File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 404, in get_cached_module_file
get_cached_module_file(
File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 345, in get_cached_module_file
resolved_module_file = cached_file(
File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/transformers/utils/hub.py", line 403, in cached_file
resolved_file = hf_hub_download(
File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 860, in hf_hub_download
return _hf_hub_download_to_cache_dir(
File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1009, in _hf_hub_download_to_cache_dir
_download_to_tmp_and_move(
File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1543, in _download_to_tmp_and_move
http_get(
File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 369, in http_get
r = _request_wrapper(
File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 301, in _request_wrapper
response = get_session().request(method=method, url=url, **params)
File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 93, in send
return super().send(request, *args, **kwargs)
File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/requests/adapters.py", line 713, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: (ReadTimeoutError("HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)"), '(Request ID: be11ada0-42e0-44f7-a395-115f618f1a7f)')
Retrying brought up this:
python3 testing-moondream.py
Traceback (most recent call last):
File "/home/ubuntu/python-projects/moondream/testing-moondream.py", line 5, in <module>
model = AutoModelForCausalLM.from_pretrained(
File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 526, in from_pretrained
config, kwargs = AutoConfig.from_pretrained(
File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1063, in from_pretrained
config_class = get_class_from_dynamic_module(
File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 553, in get_class_from_dynamic_module
return get_class_in_module(class_name, final_module, force_reload=force_download)
File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 238, in get_class_in_module
module_files: List[Path] = [module_file] + sorted(map(Path, get_relative_import_files(module_file)))
File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 128, in get_relative_import_files
new_imports.extend(get_relative_imports(f))
File "/home/ubuntu/python-projects/moondream/.moondream-venv/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 97, in get_relative_imports
with open(module_file, "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/ubuntu/.cache/huggingface/modules/transformers_modules/vikhyatk/moondream2/adcbcd1a6d27fc19974b18dc128eb51ef6837879/rope.py'
So I switched revision to:
revision="main"
And this worked for me!
Here is my code which is mostly the example code from the shared moondream docs link:
from transformers import AutoModelForCausalLM, AutoTokenizer
from PIL import Image
import time
# Initialize the model
model = AutoModelForCausalLM.from_pretrained(
"vikhyatk/moondream2",
revision="main",
trust_remote_code=True,
# Uncomment for GPU acceleration & pip install accelerate
# device_map={"": "cuda"}
)
# Load your image
image = Image.open("images\test.jpg") # an image showing different vegetables
start = time.time()
# 2. Visual Question Answering
print("\nAsking questions about the image:")
print(model.query(image, "List all foods in the image in json format!")["answer"])
end = time.time()
elapsed_time = end - start
print(elapsed_time)
Inference took nearly forever which is a bit of a bummer (~500s on a 6 core instance with 18GB of RAM) but yeah, no wonder. Cheap cpu-only cloud instance, I know. I hoped that it runs faster on cpu so I wouldn't have to go for gpu but accelerating cpu-inference breaks this issue I guess. Thanks anyways for the help so far again!
And now for testing Win11 again (setup as mentioned above):
I installed ffmpeg and checked it:
ffmpeg -version
ffmpeg version 7.1-full_build-www.gyan.dev Copyright (c) 2000-2024 the FFmpeg developers
built with gcc 14.2.0 (Rev1, Built by MSYS2 project)
configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-libsnappy --enable-zlib --enable-librist --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-libbluray --enable-libcaca --enable-sdl2 --enable-libaribb24 --enable-libaribcaption --enable-libdav1d --enable-libdavs2 --enable-libopenjpeg --enable-libquirc --enable-libuavs3d --enable-libxevd --enable-libzvbi --enable-libqrencode --enable-librav1e --enable-libsvtav1 --enable-libvvenc --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxeve --enable-libxvid --enable-libaom --enable-libjxl --enable-libvpx --enable-mediafoundation --enable-libass --enable-frei0r --enable-libfreetype --enable-libfribidi --enable-libharfbuzz --enable-liblensfun --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-dxva2 --enable-d3d11va --enable-d3d12va --enable-ffnvcodec --enable-libvpl --enable-nvdec --enable-nvenc --enable-vaapi --enable-libshaderc --enable-vulkan --enable-libplacebo --enable-opencl --enable-libcdio --enable-libgme --enable-libmodplug --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libshine --enable-libtheora --enable-libtwolame --enable-libvo-amrwbenc --enable-libcodec2 --enable-libilbc --enable-libgsm --enable-liblc3 --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-ladspa --enable-libbs2b --enable-libflite --enable-libmysofa --enable-librubberband --enable-libsoxr --enable-chromaprint
libavutil 59. 39.100 / 59. 39.100
libavcodec 61. 19.100 / 61. 19.100
libavformat 61. 7.100 / 61. 7.100
libavdevice 61. 3.100 / 61. 3.100
libavfilter 10. 4.100 / 10. 4.100
libswscale 8. 3.100 / 8. 3.100
libswresample 5. 3.100 / 5. 3.100
libpostproc 58. 3.100 / 58. 3.100
This wasn't the trick. So I followed the instructions on moondream-docs checking all other dependencies for Win and extracted the vips DLLs from /bin to project root. That didn't help either. When checking the task-manager I can see that the code utilizes ~1GB of RAM but CPU doesn't do anything. I guess the RAM represents the image encoding but moondream model isn't loaded.
So last thing I tested from my experiences with the ubuntu instance was to again change revision="2025-01-09" to
revision="main".
But that wasn't the solution either. Finally, I tried system environment variables for the DLLs but no, no chance. Neither the model is loaded nor does anything else happen. Strange & a pity.
~~Is there anything else that could be the problem here? Any other ideas?~~
Edit: I finally solved it under Win11 (py3.12.7) with ffmpeg7.1_full and vips_DLLs (see moondream docs): I noticed that torchvision wasn't installed so I gave it a try besides it wasn't mentioned in the dependencies. And lo and behold, it works!! For me it was torch-2.6.0 and torchvision-0.21.0
See details here:
pip show torchvision
WARNING: Package(s) not found: torchvision
pip install torchvision
Collecting torchvision
Using cached torchvision-0.21.0-cp312-cp312-win_amd64.whl.metadata (6.3 kB)
Requirement already satisfied: numpy in c:\users\welz\documents\living_lab\_demobau\softwareentwicklung\moondream\.venv-250128\lib\site-packages (from torchvision) (2.2.2)
Collecting torch==2.6.0 (from torchvision)
...
Using cached torchvision-0.21.0-cp312-cp312-win_amd64.whl (1.6 MB)
Using cached torch-2.6.0-cp312-cp312-win_amd64.whl (204.1 MB)
Installing collected packages: torch, torchvision
Attempting uninstall: torch
Found existing installation: torch 2.5.1
Uninstalling torch-2.5.1:
Successfully uninstalled torch-2.5.1
Successfully installed torch-2.6.0 torchvision-0.21.0
Here is the code:
from transformers import AutoModelForCausalLM, AutoTokenizer
from PIL import Image
import time
model = AutoModelForCausalLM.from_pretrained(
"vikhyatk/moondream2",
revision="main",
trust_remote_code=True,
# Uncomment to run on GPU.
# device_map={"": "cuda"}
)
# print(model)
print("Model loaded")
image_path = "vegetables_test.jpg"
image = Image.open(image_path)
start = time.time()
enc_image = model.encode_image(image)
print(model.query(image, "List all foods in json format")["answer"])
end = time.time()
elapsed_time = end - start
print(f"Elapsed time: {elapsed_time:.2f} seconds")
Unfortunately also ~220s for inference. Will check with preloaded model.
Thanks for all the help again! I'd recommend a troubleshooting section somewhere maybe?
I'm also facing the same issue, using 4vCPUs, 16GB RAM, CodeSpace VM. I replicated the exact setup and steps mentioned in the recipe for "gaze detection". the model is running successfully, but it is very slow, taking more than one minute to process a single frame. Resource usage is also normal, memory usage: 30%, CPU usage: 25%.
@vikhyat
Me too. Not working locally using transformers no matter how many errors i try to fix.