MonoDETR
MonoDETR copied to clipboard
Visualizations of attention maps in depth cross-attention
Hello, may I ask if the visualization in Figure 5 is directly output and drawn by attn_output_weights.sum(dim=1)/num_heads of depth cross-attention layer? Why is the picture drawn by my trained model very different from yours?