ray_lightning issues

change the `checkpoint_callback=True`

```python /home/ray/anaconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/callback_connector.py:151: LightningDeprecationWarning: Setting `Trainer(checkpoint_callback=True)` is deprecated in v1.5 and will be removed in v1.7. Please consider using `Trainer(enable_checkpointing=True)`. ```

JiahaoYao

horovod lightning integration missing the log dir

```python (BaseHorovodWorker pid=379, ip=172.31.46.122) Missing logger folder: /home/ray/default/ray_lightning/ray_lightning/tests/lightning_logs ```

JiahaoYao

horovod installation issue

1

Using the flag to install horovod but met with the following issues. ```python (tensorflow2_p38) ubuntu@ip-10-0-2-36:~/anaconda3/envs/tensorflow2_p38/lib/python3.8/site-packages/ray_lightning$ HOROVOD_WITH_TENSORFLOW=1 HOROVOD_WITH_TORCH=1 HOROVOD_WITH_GLOO=1 pip install --no-cache-dir horovod[tensorflow] horovod[ray] horovod[torch] Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com Collecting...

JiahaoYao

[Security] Support for torch lightning 1.6 and future support

5

Currently the master branch supports TL 1.5. What's the plan and timeline regarding TL 1.6? Also, we want to utilize distributed HPO with each trial being distributed itself, and found...

yinweisu

Multiple WandB experiments created with PyTorch lightning DDP

When I use the ray lightning plugin for distributed training, I see two wandb experiments created. One which never logs anything (but has configs that were updated before calling `pl.Trainer.fit`),...

brodyh

[Tune] PBT/PB2 doesn't work correctly with Ray Lightning

3

When using PBT/PB2, I received the following error: ```shell RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! ``` This...

yinweisu

[Feature] integrate with Lightning ecosystem CI

### Search before asking - [X] I had searched in the [issues](https://github.com/ray-project/ray/issues) and found no similar feature requirement. ### Description Hello and happy that you show integration with PL! :tada:...

Borda

enhancement

[Tune] Ray Tune gives incorrect warning about not using GPUs when used with Ray Lightning

4

When trained on gpu, I see the following warning: ```shell 2022-05-03 00:02:20,033 WARNING tune.py:637 -- Tune detects GPUs, but no trials are using GPUs. To enable trials to use GPUs,...

yinweisu

[Tune] Run rank 0 worker in main process when used with Tune

Running Ray Lightning with Tune has led to various confusions with how resources are handled (https://github.com/ray-project/ray_lightning/issues/138, https://github.com/ray-project/ray_lightning/issues/23). Currently, the Tune trainable process does not do any training and does not...

amogkam

Not detecting GPU within trials when HPO

5

I tried to use ray lightning + ray tune to do distributed HPO and found GPUs are not available within the trial even when I set `use_gpu` to be True...

yinweisu

ray_lightning
ray_lightning copied to clipboard

Metadata

change the `checkpoint_callback=True`

horovod lightning integration missing the log dir

horovod installation issue

[Security] Support for torch lightning 1.6 and future support

Multiple WandB experiments created with PyTorch lightning DDP

[Tune] PBT/PB2 doesn't work correctly with Ray Lightning

[Feature] integrate with Lightning ecosystem CI

[Tune] Ray Tune gives incorrect warning about not using GPUs when used with Ray Lightning

[Tune] Run rank 0 worker in main process when used with Tune

Not detecting GPU within trials when HPO

← Metadata

Owner

Metadata

ray_lightning ray_lightning copied to clipboard

Metadata

← Metadata

Owner

Metadata

ray_lightning
ray_lightning copied to clipboard