torchrec
torchrec copied to clipboard
Repro the ghost processes for TorchAsyncITer
Summary: First generate the data: bash nvt_preproc.sh /data/criteo/ /data/criteo_1_day/ 8192
Then run the command:
torchx run -s local_cwd dist.ddp -j 1x8 --script train_torchrec.py -- --num_embeddings_per_feature 45833188,36746,17245,7413,20243,3,7114,1441,62,29275261,1572176,345138,10,2209,11267,128,4,974,14,48937457,11316796,40094537,452104,12606,104,35 --over_arch_layer_sizes 1024,1024,512,256,1 --binary_path /data/criteo_1_day/criteo_preproc/train/
And use nvidia-smi to check the ghosts process
Differential Revision: D37794009
This pull request was exported from Phabricator. Differential Revision: D37794009