lit-llama
lit-llama copied to clipboard
changing `devices` to `fabric.world_size` in the pretrain code
Hello,
according to our discussion here, I think devices
should be changed in the pretraininig code to fabric.world_size
, since the batch size refers to the global batch size.
devices
in the code is equal to the value of GPUs in a single node.
process_batch_size = batch_size // fabric.world_size
I believe the same thing goes for max_iters = 600000 # num_epochs * (epoch_size // micro_batch_size) // devices
Hi @LamOne1
The suggestion sounds good to me for process_batch_size = batch_size // fabric.world_size
. The reason it was not done for Shakespeare is that multi-machine training is not really needed for this amount of data. Since the redpajama was based on the same script, it was carried over. In any case, using the world size would be correct in the general case.
For the max_iters, honestly I think it should be kept as "infinite" for practical reasons, but I'm fine with either if it doesn't complicate things.