torchrec dlrm fixes

Add useful error check. Fix model training mode state so it's on when it should be. Stop adding validation samples to training set. Fix validation from running after a single training step. Fix warning clogging terminal. Add with torch.no_grad() to disable gradient computation when it should be; (model.eval() does not disable it). Pass mmap_mode variable to the function needing it. Add --print_freq CLI option to control the progress display update interval in seconds.

Apr 08 '22 14:04 samiwilf

@samiwilf has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Apr 08 '22 14:04 facebook-github-bot

@samiwilf has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Apr 08 '22 14:04 facebook-github-bot

@samiwilf has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Apr 08 '22 14:04 facebook-github-bot

@samiwilf has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Apr 08 '22 15:04 facebook-github-bot

@samiwilf has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Apr 08 '22 15:04 facebook-github-bot

@samiwilf has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Apr 08 '22 15:04 facebook-github-bot

@samiwilf has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Apr 08 '22 15:04 facebook-github-bot

@samiwilf has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Apr 08 '22 16:04 facebook-github-bot

@samiwilf has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Apr 08 '22 16:04 facebook-github-bot

torchrec torchrec copied to clipboard

dlrm fixes

torchrec
torchrec copied to clipboard