torchtune icon indicating copy to clipboard operation
torchtune copied to clipboard

Assert LLama Vision Image size divides by 14

Open pbontrager opened this issue 4 months ago • 3 comments

Image size must be divisible by ViT patches in the CLIP encoder, 14.

pbontrager avatar Oct 02 '24 21:10 pbontrager