torchtitan icon indicating copy to clipboard operation
torchtitan copied to clipboard

Add a 3-stage PP config

Open wconstab opened this issue 1 year ago • 0 comments

Stack from ghstack (oldest at bottom):

  • -> #345
  • #344
  • #354

Pipelining is unique in that there is no need to stick to power-of-2 numbers of stages, and there maybe reasons an odd number is optimal depending on how you divide up your cluster.

Anyway, I use this for validation of the 1f1b schedule in a slightly-more-complicated than 2-stage but simpler than 4-stage setup.

seems to run fine, if run with an even batch size (--training.batch_size 12)

wconstab avatar May 18 '24 00:05 wconstab