torchtune
torchtune copied to clipboard
Assert LLama Vision Image size divides by 14
Image size must be divisible by ViT patches in the CLIP encoder, 14.