Perf Issues on documentation of word2vec training

Hi, I have some questions about https://github.com/PaddlePaddle/Perf/blob/master/Word2Vec/readme.md

The command to run on single machine should be python -u ../../../../tools/static_ps_trainer.py -m benchmark.yaml right. Currently it is ../../../
After I run the command above, the training starts, but single machine seems need long time to finish, where could I set iterations? Or I must wait at least one epoch to finish. Thank you very much! @MrChengmo @luotao1

Apr 21 '21 12:04 lidanqing-intel

Hi, https://github.com/PaddlePaddle/PaddleRec/tree/master/models/recall/word2vec static model training command python -u ../../../tools/static_trainer.py -m config.yaml is also out of date because static_trainer.py is deleted. Should we use PaddleRec branch 2.0.0? But I saw last commit of 2.0.0 is Jan. Should we still use develop version? Thank you very much !

Apr 21 '21 12:04 lidanqing-intel

Hi, I have some questions about https://github.com/PaddlePaddle/Perf/blob/master/Word2Vec/readme.md

The command to run on single machine should be python -u ../../../../tools/static_ps_trainer.py -m benchmark.yaml right. Currently it is ../../../

After I run the command above, the training starts, but single machine seems need long time to finish, where could I set iterations? Or I must wait at least one epoch to finish. Thank you very much! @MrChengmo @luotao1

Please refer to this link：https://github.com/PaddlePaddle/Perf/tree/master/Word2Vec
The code for the recurrence effect is located in： https://github.com/PaddlePaddle/PaddleRec/tree/master/models/recall/word2vec/benchmark
A round of full data training takes more than 40 hours，if you only want to test the performance, it can be quickly tested on small samples

Apr 22 '21 03:04 MrChengmo

Hi, https://github.com/PaddlePaddle/PaddleRec/tree/master/models/recall/word2vec static model training command python -u ../../../tools/static_trainer.py -m config.yaml is also out of date because static_trainer.py is deleted. Should we use PaddleRec branch 2.0.0? But I saw last commit of 2.0.0 is Jan. Should we still use develop version? Thank you very much !

We recommend using the master branch code for training

Apr 22 '21 03:04 MrChengmo

where could I set iterations

https://github.com/PaddlePaddle/PaddleRec/blob/master/models/recall/word2vec/benchmark/benchmark.yaml#L26

runner:
  epochs: 15
  print_interval: 100

A round of full data training takes more than 40 hours

It means that each epoch takes more than 40 hours.

it can be quickly tested on small samples

You can select small samples among full data to test accuracy or performance.

Apr 22 '21 03:04 luotao1

Perf Perf copied to clipboard

Issues on documentation of word2vec training

Perf
Perf copied to clipboard