Tensor Parallelism with the current qk norm.

Open wimh966 opened this issue 4 months ago • 1 comments

❓ The question

Suppose that I need to do the mid-train over the 7B model, how can we enable the tensor parallelism with the current qk norm? Because currently it calculates the avg/std over all hidden dimensions. Thank you.

Aug 18 '25 13:08 wimh966

Hi there, thanks for the question! Our 7B model was trained on our old trainer, which unfortunately doesn't have an option for tensor parallelism. Our new trainer has this option, so you might be able to port to the new trainer, located in our OLMo-core repo (tensor parallelism in new trainer: here)

Aug 29 '25 00:08 baileykuehl