llama-models
llama-models copied to clipboard
refactor: make llama3 generation closer to llama4
Make the generators simpler and closer to each other. There is a ton of duplicated code which needs to be removed.
Test Plan
Run all variants of the matrix:
- MODEL in (llama3, llama4)
- QUANT in (none, fp8_mixed, int4_mixed)
NGPUS=1
MODEL=llama3
QUANT=fp8_mixed
CHECKPOINT_DIR=~/.llama/checkpoints/Llama3.2-11B-Vision-Instruct/
torchrun --nproc-per-node=$NGPUS -m models.$MODEL.scripts.completion \
$CHECKPOINT_DIR --world_size $NGPUS --quantization_mode $QUANT