EVA
EVA copied to clipboard
Application of TTA to EVA-02 Detection Model
We want to proceed with muti-scale inference using "Generalized RCNNWithTTA" supported by detectron2.
The function is associated with detectron2, but the RoPE function of the improved ViT model results in an error that indicates different dimensions than the assigned image size. Is there any possibility of using TTA and if possible, what method should I use to find out?
backbone.net.rope_glb.freqs_cos will not be loaded. Please double check and see if this is desired.
Shape of backbone.net.rope_glb.freqs_sin in checkpoint is torch.Size([9216, 64]), while shape of backbone.net.rope_glb.freqs_sin in model is torch.Size([14400, 64]).
backbone.net.rope_glb.freqs_sin will not be loaded. Please double check and see if this is desired.
Skip loading parameter 'backbone.net.rope_glb.freqs_cos' to the model due to incompatible shapes: (9216, 64) in the checkpoint but (14400, 64) in the model! You might want to double check if this is expected.
Skip loading parameter 'backbone.net.rope_glb.freqs_sin' to the model due to incompatible shapes: (9216, 64) in the checkpoint but (14400, 64) in the model! You might want to double check if this is expected.
Some model parameters or buffers are not found in the checkpoint:
backbone.net.blocks.11.attn.rope.{freqs_cos, freqs_sin}
backbone.net.blocks.14.attn.rope.{freqs_cos, freqs_sin}
backbone.net.blocks.17.attn.rope.{freqs_cos, freqs_sin}
backbone.net.blocks.2.attn.rope.{freqs_cos, freqs_sin}
backbone.net.blocks.20.attn.rope.{freqs_cos, freqs_sin}
backbone.net.blocks.23.attn.rope.{freqs_cos, freqs_sin}
backbone.net.blocks.5.attn.rope.{freqs_cos, freqs_sin}
backbone.net.blocks.8.attn.rope.{freqs_cos, freqs_sin}
backbone.net.rope_glb.{freqs_cos, freqs_sin}
The checkpoint state_dict contains keys that are not used by the model:
pixel_mean
pixel_std
/home/ailab/ed/EVA-02/EVA/EVA-02/det/detectron2/modeling/meta_arch/rcnn.py:273: UserWarning: square_size=1536, is smaller than max_size=3840 in batch
warnings.warn("square_size={}, is smaller than max_size={} in batch".format(
Output exceeds the [size limit](command:workbench.action.openSettings?%5B%22notebook.output.textLineLimit%22%5D). Open the full output data[ in a text editor](command:workbench.action.openLargeOutput?5f93cf19-d59d-4076-807b-b8f0710a445f)
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[3], line 2
1 ori_image = "/data/ed/iseg/TF_model_datasets/TF_ISEG_PJ2/lx_valid_images/Selection/Camera_FrontLeft_00_2019Y06M24D19H52m37s_0154.png"
----> 2 pred, vis = do_inference(cfg, model, ori_image, img_scale = (1920, 1200))
3 vis.save('/home/ailab/ed/EVA-02/EVA/EVA-02/det/demo/vis_output.png')
File /home/ailab/ed/EVA-02/EVA/EVA-02/det/demo/inference.py:81, in do_inference(cfg, model, ori_image, img_scale)
78 inputs = {"image": image, "height": height, "width": width}
80 with torch.no_grad():
---> 81 predictions = model([inputs])
82 instances = predictions[0]["instances"].to(torch.device("cpu"))
83 vis_output = visualizer.draw_instance_predictions(predictions=instances)
File /home/ailab/ed/EVA-02/EVA/EVA-02/det/detectron2/modeling/test_time_augmentation_custom.py:200, in GeneralizedRCNNWithTTA_custom.__call__(self, batched_inputs)
197 ret["width"] = image.shape[2]
198 return ret
--> 200 return [self._inference_one_image(_maybe_read_image(x)) for x in batched_inputs]
File /home/ailab/ed/EVA-02/EVA/EVA-02/det/detectron2/modeling/test_time_augmentation_custom.py:200, in <listcomp>(.0)
197 ret["width"] = image.shape[2]
198 return ret
--> 200 return [self._inference_one_image(_maybe_read_image(x)) for x in batched_inputs]
File /home/ailab/ed/EVA-02/EVA/EVA-02/det/detectron2/modeling/test_time_augmentation_custom.py:215, in GeneralizedRCNNWithTTA_custom._inference_one_image(self, input)
...
File /home/ailab/ed/EVA-02/EVA/EVA-02/det/detectron2/modeling/backbone/utils.py:348, in VisionRotaryEmbeddingFast.forward(self, t)
--> 348 def forward(self, t): return t * self.freqs_cos + rotate_half(t) * self.freqs_sin
RuntimeError: The size of tensor a (57600) must match the size of tensor b (14400) at non-singleton dimension 2