LLaVA-HR
LLaVA-HR copied to clipboard
Understanding how MR-Adapter works
Great work! May I know the intuitive reasons why the MR-Adapter is designed this way?
- Why do we need Conv block for low resolution but MLP for high resolution?
- What's the reason behind having this gate g in [-1, 1] before aggregation of high resolution features?
- What's the reason of adding the original features Fvl in equation (3)? Is it to help the gradient flow?