moondream Nvidia Jetson runtime failure

 python3 sample.py --image img.png --prompt "hi"

Mar 13 '24 05:03 Links17

I get a different type of error on my Jetson Orin with JetPack 5.1.2:

Using device: cuda If you run into issues, pass the --cpu flag to this script. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Traceback (most recent call last): File "sample.py", line 32, in moondream = Moondream.from_pretrained( File "/home/theuser/Devs/GitHub/MoonDream/MoonDreamEnv/lib/python3.8/site-packages/transformers/modeling_utils.py", line 3462, in from_pretrained model = cls(config, *model_args, **model_kwargs) File "/home/theuser/Devs/GitHub/MoonDream/moondream/moondream/moondream.py", line 16, in init self.vision_encoder = VisionEncoder() File "/home/theuser/Devs/GitHub/MoonDream/moondream/moondream/vision_encoder.py", line 98, in init VisualHolder(timm.create_model("vit_so400m_patch14_siglip_384")) File "/home/theuser/Devs/GitHub/MoonDream/MoonDreamEnv/lib/python3.8/site-packages/timm/models/_factory.py", line 117, in create_model model = create_fn( File "/home/theuser/Devs/GitHub/MoonDream/MoonDreamEnv/lib/python3.8/site-packages/timm/models/vision_transformer.py", line 2598, in vit_so400m_patch14_siglip_384 model = create_vision_transformer( File "/home/theuser/Devs/GitHub/MoonDream/MoonDreamEnv/lib/python3.8/site-packages/timm/models/vision_transformer.py", line 1764, in create_vision_transformer return build_model_with_cfg( File "/home/theuser/Devs/GitHub/MoonDream/MoonDreamEnv/lib/python3.8/site-packages/timm/models/builder.py", line 385, in build_model_with_cfg model = model_cls(**kwargs) File "/home/theuser/Devs/GitHub/MoonDream/MoonDreamEnv/lib/python3.8/site-packages/timm/models/vision_transformer.py", line 525, in init self.attn_pool = AttentionPoolLatent( File "/home/theuser/Devs/GitHub/MoonDream/MoonDreamEnv/lib/python3.8/site-packages/timm/layers/attention_pool.py", line 63, in init self.init_weights() File "/home/theuser/Devs/GitHub/MoonDream/MoonDreamEnv/lib/python3.8/site-packages/timm/layers/attention_pool.py", line 68, in init_weights trunc_normal_tf(self.latent, std=self.latent_dim ** -0.5) File "/home/theuser/Devs/GitHub/MoonDream/MoonDreamEnv/lib/python3.8/site-packages/timm/layers/weight_init.py", line 94, in trunc_normal_tf trunc_normal(tensor, 0, 1.0, a, b) File "/home/theuser/Devs/GitHub/MoonDream/MoonDreamEnv/lib/python3.8/site-packages/timm/layers/weight_init.py", line 32, in trunc_normal tensor.erfinv() RuntimeError: "erfinv_vml_cpu" not implemented for 'Half'

I made sure to run compatible versions of torch/torchvision with cuda 11.4: torch 2.1.0a0+41361538.nv23.6 torchvision 0.16.2

Mar 16 '24 05:03 whab

I fixed my issue with sample.py by modifying the code in python3.8/sitepackages/timm/layers/weight_init.py to convert the tensor to float32 and back to float16 after the incompatible operations. This appears to be only executed before run-time inference so it does not seem to affect the performance. Running the webcam_gradio_demo.py demo (not affected by the above 'Half' issue by the way) on my Jetson Orin Dev Kit, I get an update every 2 secs or even a bit less, slightly faster than my Mac mini M2 Pro running MoonDream with MPS Torch ;)

Mar 16 '24 07:03 whab