llama3
llama3 copied to clipboard
Scaling configurations (Table 4) in the paper "The Llama 3 Herd of Models"
In the Table 4 of the paper, GPU total number 16384 is not matching with the parallelism group [8, 16, 16, 4]. Is this a mistake in the paper?