ray_lightning icon indicating copy to clipboard operation
ray_lightning copied to clipboard

[Security] Support for torch lightning 1.6 and future support

Open yinweisu opened this issue 2 years ago • 5 comments

Currently the master branch supports TL 1.5. What's the plan and timeline regarding TL 1.6?

Also, we want to utilize distributed HPO with each trial being distributed itself, and found ray tune recommends this project. We are a little bit worried about whether this project will continue to be supported because the activity of this project is relatively low. Do you have a roadmap?

yinweisu avatar Apr 22 '22 20:04 yinweisu

Hello! Ray Lightning is still actively maintained by the Ray team. There is a pull request to support TL 1.6 but likely some work needs to be done before that could happen. Also have you looked at Ray Train? There is also the next generation Ray AI Runtime that is coming into beta soon: https://github.com/ray-project/ray/issues/22488. You case sounds like a good fit there. I would highly recommend taking a look there as well. There will be dogfooding and feedback collection along with beta rollout.

xwjiang2010 avatar Apr 23 '22 15:04 xwjiang2010

+1 for updating support for PyTorch Lightning 1.6; and adding a link to this open PR on ray_lightning.

dynamicwebpaige avatar Apr 24 '22 16:04 dynamicwebpaige

+1 for updating to 1.6.

MarkusSpanring avatar May 24 '22 09:05 MarkusSpanring

+1 for updating to 1.6. There are two high-severity security issues opened for <1.6 tied to pyyaml

  • https://security.snyk.io/vuln/SNYK-PYTHON-PYTORCHLIGHTNING-2419028
  • https://security.snyk.io/vuln/SNYK-PYTHON-PYTORCHLIGHTNING-2325279

gradientsky avatar Jun 10 '22 22:06 gradientsky

Hi @krfricke, in the pytorch-lightning 1.6, currently the dev-branch is able to run on multi-node gpu setting, but one thing is lacked --- the trainer needs to updated from the ray workers like the version 1.5 did:

https://github.com/ray-project/ray_lightning/blob/6aed848f757a03c03166c1a9bddfeea5153e7b90/ray_lightning/ray_ddp.py#L362-L401

For example, the network parameters in the ray workers needs to pass out to the trainer.model.

In the version 1.6, the post-dispatch is about to be deprecated.

https://github.com/Lightning-AI/lightning/blob/master/src/pytorch_lightning/strategies/strategy.py#L509-L515

Maybe the updated from ray wrokers can be added in teardown

https://github.com/Lightning-AI/lightning/blob/176ca1fdccefbb99ad610ad422c74f0c59653a9c/src/pytorch_lightning/strategies/strategy.py#L442-L453

JiahaoYao avatar Jun 21 '22 06:06 JiahaoYao