torchrec icon indicating copy to clipboard operation
torchrec copied to clipboard

Repro the ghost processes for TorchAsyncITer

Open RenfeiChen-FB opened this issue 3 years ago • 1 comments

Summary: First generate the data: bash nvt_preproc.sh /data/criteo/ /data/criteo_1_day/ 8192

Then run the command:

torchx run -s local_cwd dist.ddp -j 1x8 --script train_torchrec.py -- --num_embeddings_per_feature 45833188,36746,17245,7413,20243,3,7114,1441,62,29275261,1572176,345138,10,2209,11267,128,4,974,14,48937457,11316796,40094537,452104,12606,104,35 --over_arch_layer_sizes 1024,1024,512,256,1 --binary_path /data/criteo_1_day/criteo_preproc/train/

And use nvidia-smi to check the ghosts process

Differential Revision: D37794009

RenfeiChen-FB avatar Jul 12 '22 20:07 RenfeiChen-FB

This pull request was exported from Phabricator. Differential Revision: D37794009

facebook-github-bot avatar Jul 12 '22 20:07 facebook-github-bot