llama-models
llama-models copied to clipboard

Published 20 hours ago •

Reame
Issues

refactor: make llama3 generation closer to llama4

Open ashwinb opened this issue 8 months ago • 0 comments

Make the generators simpler and closer to each other. There is a ton of duplicated code which needs to be removed.

Test Plan

Run all variants of the matrix:

MODEL in (llama3, llama4)
QUANT in (none, fp8_mixed, int4_mixed)

NGPUS=1
MODEL=llama3
QUANT=fp8_mixed
CHECKPOINT_DIR=~/.llama/checkpoints/Llama3.2-11B-Vision-Instruct/
torchrun --nproc-per-node=$NGPUS -m models.$MODEL.scripts.completion \
   $CHECKPOINT_DIR  --world_size $NGPUS --quantization_mode $QUANT

Apr 07 '25 00:04 ashwinb