Eric Buehler
Eric Buehler
@lucasavila00, it looks like there are some merge conflicts.
@lucasavila00, this looks good. However, I think there is one more merge conflict.
@lucasavila00, I think there are unfortunately still some merge conflicts.
@lucasavila00 thank you for adding this!
I think that we should start sampling timing there as it excludes the sync point.
Done in a previous PR which didn't close this.
To use cross-entropy loss, we configure the training loop to use CE loss.
@crossxxd, we do not train for the X-LoRA classifier's scalings output in the paper, although you could try that. We just train the model as normal, with the CE loss...
@lucasavila00, I think there are merge conflicts here.
@lucasavila00, does this require #198?