_githubsgi
_githubsgi
@tianyu-l , I did try it out. Works with some changes. 1. The c4 en dataset , training and validation, is prepared slightly differently. 2. The dataset has to be...
@tianyu-l , please let me know your thoughts. I am planning to do a PR.
@tianyu-l , please see if this makes sense to you. As you pointed out, integrating the MLPerf Llama3 8B model into TorchTitan would allow running a widely used performance benchmark...
@wconstab , there are essentially 2 minor differences that I can see as far as data prep goes . 1. A download script is provided by MLCommons for downloading the...
@wconstab , any comment ? Also looked into the loss function for Llama3 8B. It is Cross Entropy, hence perplexity = 2**loss. So, a perplexity of 3.3 is 1.72246602447 of...
@wconstab and @tianyu-l , any comments on the above ? Do you see benefits in adding a condensed version of this issue in the README page ?