python3 sample.py --image img.png --prompt "hi"
I get a different type of error on my Jetson Orin with JetPack 5.1.2:
Using device: cuda
If you run into issues, pass the --cpu flag to this script.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Traceback (most recent call last):
File "sample.py", line 32, in
moondream = Moondream.from_pretrained(
File "/home/theuser/Devs/GitHub/MoonDream/MoonDreamEnv/lib/python3.8/site-packages/transformers/modeling_utils.py", line 3462, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "/home/theuser/Devs/GitHub/MoonDream/moondream/moondream/moondream.py", line 16, in init
self.vision_encoder = VisionEncoder()
File "/home/theuser/Devs/GitHub/MoonDream/moondream/moondream/vision_encoder.py", line 98, in init
VisualHolder(timm.create_model("vit_so400m_patch14_siglip_384"))
File "/home/theuser/Devs/GitHub/MoonDream/MoonDreamEnv/lib/python3.8/site-packages/timm/models/_factory.py", line 117, in create_model
model = create_fn(
File "/home/theuser/Devs/GitHub/MoonDream/MoonDreamEnv/lib/python3.8/site-packages/timm/models/vision_transformer.py", line 2598, in vit_so400m_patch14_siglip_384
model = create_vision_transformer(
File "/home/theuser/Devs/GitHub/MoonDream/MoonDreamEnv/lib/python3.8/site-packages/timm/models/vision_transformer.py", line 1764, in create_vision_transformer
return build_model_with_cfg(
File "/home/theuser/Devs/GitHub/MoonDream/MoonDreamEnv/lib/python3.8/site-packages/timm/models/builder.py", line 385, in build_model_with_cfg
model = model_cls(**kwargs)
File "/home/theuser/Devs/GitHub/MoonDream/MoonDreamEnv/lib/python3.8/site-packages/timm/models/vision_transformer.py", line 525, in init
self.attn_pool = AttentionPoolLatent(
File "/home/theuser/Devs/GitHub/MoonDream/MoonDreamEnv/lib/python3.8/site-packages/timm/layers/attention_pool.py", line 63, in init
self.init_weights()
File "/home/theuser/Devs/GitHub/MoonDream/MoonDreamEnv/lib/python3.8/site-packages/timm/layers/attention_pool.py", line 68, in init_weights
trunc_normal_tf(self.latent, std=self.latent_dim ** -0.5)
File "/home/theuser/Devs/GitHub/MoonDream/MoonDreamEnv/lib/python3.8/site-packages/timm/layers/weight_init.py", line 94, in trunc_normal_tf
trunc_normal(tensor, 0, 1.0, a, b)
File "/home/theuser/Devs/GitHub/MoonDream/MoonDreamEnv/lib/python3.8/site-packages/timm/layers/weight_init.py", line 32, in trunc_normal
tensor.erfinv()
RuntimeError: "erfinv_vml_cpu" not implemented for 'Half'
I made sure to run compatible versions of torch/torchvision with cuda 11.4:
torch 2.1.0a0+41361538.nv23.6
torchvision 0.16.2
Mar 16
'24 05:03
whab
I fixed my issue with sample.py by modifying the code in python3.8/sitepackages/timm/layers/weight_init.py to convert the tensor to float32 and back to float16 after the incompatible operations. This appears to be only executed before run-time inference so it does not seem to affect the performance. Running the webcam_gradio_demo.py demo (not affected by the above 'Half' issue by the way) on my Jetson Orin Dev Kit, I get an update every 2 secs or even a bit less, slightly faster than my Mac mini M2 Pro running MoonDream with MPS Torch ;)
Mar 16
'24 07:03
whab