OLMo
OLMo copied to clipboard
Tensor Parallelism with the current qk norm.
❓ The question
Suppose that I need to do the mid-train over the 7B model, how can we enable the tensor parallelism with the current qk norm? Because currently it calculates the avg/std over all hidden dimensions. Thank you.
Hi there, thanks for the question! Our 7B model was trained on our old trainer, which unfortunately doesn't have an option for tensor parallelism. Our new trainer has this option, so you might be able to port to the new trainer, located in our OLMo-core repo (tensor parallelism in new trainer: here)