Perf icon indicating copy to clipboard operation
Perf copied to clipboard

Issues on documentation of word2vec training

Open lidanqing-intel opened this issue 3 years ago • 4 comments

Hi, I have some questions about https://github.com/PaddlePaddle/Perf/blob/master/Word2Vec/readme.md

  1. The command to run on single machine should be python -u ../../../../tools/static_ps_trainer.py -m benchmark.yaml right. Currently it is ../../../
  2. After I run the command above, the training starts, but single machine seems need long time to finish, where could I set iterations? Or I must wait at least one epoch to finish. Thank you very much! @MrChengmo @luotao1

lidanqing-intel avatar Apr 21 '21 12:04 lidanqing-intel

Hi, https://github.com/PaddlePaddle/PaddleRec/tree/master/models/recall/word2vec static model training command python -u ../../../tools/static_trainer.py -m config.yaml is also out of date because static_trainer.py is deleted. Should we use PaddleRec branch 2.0.0? But I saw last commit of 2.0.0 is Jan. Should we still use develop version? Thank you very much !

lidanqing-intel avatar Apr 21 '21 12:04 lidanqing-intel

Hi, I have some questions about https://github.com/PaddlePaddle/Perf/blob/master/Word2Vec/readme.md

  1. The command to run on single machine should be python -u ../../../../tools/static_ps_trainer.py -m benchmark.yaml right. Currently it is ../../../
  2. After I run the command above, the training starts, but single machine seems need long time to finish, where could I set iterations? Or I must wait at least one epoch to finish. Thank you very much! @MrChengmo @luotao1
  • Please refer to this link:https://github.com/PaddlePaddle/Perf/tree/master/Word2Vec
  • The code for the recurrence effect is located in: https://github.com/PaddlePaddle/PaddleRec/tree/master/models/recall/word2vec/benchmark
  • A round of full data training takes more than 40 hours,if you only want to test the performance, it can be quickly tested on small samples

MrChengmo avatar Apr 22 '21 03:04 MrChengmo

Hi, https://github.com/PaddlePaddle/PaddleRec/tree/master/models/recall/word2vec static model training command python -u ../../../tools/static_trainer.py -m config.yaml is also out of date because static_trainer.py is deleted. Should we use PaddleRec branch 2.0.0? But I saw last commit of 2.0.0 is Jan. Should we still use develop version? Thank you very much !

We recommend using the master branch code for training

MrChengmo avatar Apr 22 '21 03:04 MrChengmo

where could I set iterations

https://github.com/PaddlePaddle/PaddleRec/blob/master/models/recall/word2vec/benchmark/benchmark.yaml#L26

runner:
  epochs: 15
  print_interval: 100

A round of full data training takes more than 40 hours

It means that each epoch takes more than 40 hours.

it can be quickly tested on small samples

You can select small samples among full data to test accuracy or performance.

luotao1 avatar Apr 22 '21 03:04 luotao1