DI-engine icon indicating copy to clipboard operation
DI-engine copied to clipboard

feature(nyz): add new middleware distributed demo

Open PaParaZz1 opened this issue 2 years ago • 6 comments

Description

  • [x] DataParallel demo
  • [x] DistributedDataParallel demo
  • [x] tb logger example
  • [x] Distributed RL demo (Ape-X type)
  • [ ] Distributed RL demo (APPO type)
  • [ ] Distributed RL demo (R2D2 type)
  • [ ] Distributed RL demo (IMPALA type)
  • [ ] Distributed RL demo (SEED RL type)

Related Issue

#102 #176

TODO

Check List

  • [ ] merge the latest version source branch/repo, and resolve all the conflicts
  • [ ] pass style check
  • [ ] pass all the tests

PaParaZz1 avatar May 15 '22 11:05 PaParaZz1

Codecov Report

Merging #321 (dde6009) into main (dd2b3a5) will decrease coverage by 0.59%. The diff coverage is 79.88%.

@@            Coverage Diff             @@
##             main     #321      +/-   ##
==========================================
- Coverage   85.39%   84.79%   -0.60%     
==========================================
  Files         532      556      +24     
  Lines       43943    44718     +775     
==========================================
+ Hits        37523    37919     +396     
- Misses       6420     6799     +379     
Flag Coverage Δ
unittests 84.79% <79.88%> (-0.60%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
ding/data/buffer/tests/test_buffer_benchmark.py 37.70% <ø> (ø)
ding/entry/tests/test_cli_ditask.py 100.00% <ø> (ø)
ding/policy/base_policy.py 74.85% <ø> (+0.84%) :arrow_up:
ding/policy/sac.py 60.29% <ø> (-0.08%) :arrow_down:
...ework/middleware/functional/termination_checker.py 22.50% <16.66%> (-8.75%) :arrow_down:
ding/data/tests/test_model_loader.py 23.63% <23.63%> (ø)
ding/framework/tests/test_task.py 92.50% <35.71%> (-7.50%) :arrow_down:
ding/framework/middleware/functional/trainer.py 84.84% <40.00%> (-3.04%) :arrow_down:
ding/policy/dqn.py 87.34% <40.00%> (-1.54%) :arrow_down:
ding/framework/middleware/functional/enhancer.py 39.65% <42.85%> (-0.73%) :arrow_down:
... and 271 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

codecov[bot] avatar May 15 '22 12:05 codecov[bot]

What is the throughput of this? Does this beat SampleFactory? @PaParaZz1 @sailxjx

zxzzz0 avatar Jun 21 '22 21:06 zxzzz0

@zxzzz0 This is not to compare the speed with sample factory, because you know that the bottleneck of RL training may appear in any one of the collecting, training, and evaluation, for example, too fast collecting may lead to too much generation difference and underfitting of the model, and because of the existence of GIL, the deserialization of data on the training process will also slow down the overall training efficiency, and there are many points that we need to consider in this project. This issue will provide a new design pattern for global RL training, starting from the idea that users can easily scale from single-computer studies to large-scale distributed systems without very large code modification costs or performance losses. If you are really very very concerned about environment-side collecting, then you can use sample factory inside the di-engine to achieve the collecting efficiency you expect.

sailxjx avatar Jun 22 '22 01:06 sailxjx

If you are really very very concerned about environment-side collecting

No. To clarify, we only care about overall performance, which means the time it will take to reach certain reward in the end.

Usually if you can squeeze every drop of performance out of the CPU/GPU, you can learn faster. Environment-side collecting is just one indicator and there are many other indicators as well. You will also have to pay attention to the learner FPS, GPU utilization and other indicators for you to understand the throughput of the whole system.

When doing benchmarking, it's not targeted for the collector side but the overall growth speed of reward.

zxzzz0 avatar Jun 22 '22 14:06 zxzzz0

No. To clarify, we only care about overall performance, which means the time it will take to reach certain reward in the end.

Yeah, that's right, the purpose of the distributed version is to maximize overall performance while not requiring too much effort to write code on multiple tasks.

Another consideration is that we need to go design-first. Only after the upper layer interface is unified and stable, it will be possible to gradually optimize all aspects of performance without disturbing the user. You can see that from version 0.x to version 1.0, we have gradually developed a definite interface style example, and the purpose of this branch is to extend this interface style to distributed operation.

sailxjx avatar Jun 23 '22 02:06 sailxjx

Sounds good. In the future, please benchmark different design/interface so that you are confident enough to say that you've chosen the design with the best overall performance.

If you don't do benchmarking (as I did before for di-engine) and you find something that you could improve the performance after the design is frozen in version 1.0, you can't change it without a major version update.

zxzzz0 avatar Jun 23 '22 14:06 zxzzz0