ray_lightning icon indicating copy to clipboard operation
ray_lightning copied to clipboard

Pytorch Lightning Distributed Accelerators using Ray

Results 66 ray_lightning issues
Sort by recently updated
recently updated
newest added

cross referencing from https://discuss.ray.io/t/checkpointing-errors-on-complex-models/7637/4 While using TuneReportCheckpointCallback I get ``` Trial returned a result which did not include the specified metric ``` When using TuneReportCallback such error do not occur...

**Note**: this is the experimental PR (the main parts are from https://github.com/ray-project/ray_lightning/pull/196) The goal is to fix the last CI test.

Fix related to #213 This PR should also fix an unreachable code segment introduced in my previous PR #208

cross referencing from https://discuss.ray.io/t/checkpointing-errors-on-complex-models/7637/6 when using the population based scheduler I get error ``` raise ValueError( ValueError: To fetch the `best_config`, pass a `metric` and `mode` parameter to `tune.run()`. Alternatively,...

## 🐛 Bug Deterministic mode is not set on all workers when `Trainer` is set to `deterministic=True`. ### To Reproduce The script is divided in two parts. `test.py` and `model.py`...

Driver node and rank 0 use same path to save and load weights in ModelCheckpointCallback. It is possible driver node and rank 0 are not on the same machine, or...

### Problem statement When starting a hyperparameter search on a multi GPU node (4 GPUs) I run into a mismatch of visible CUDA devices. Below is the full code to...

bug

Hi all! When I try to use multiple workers in the `DataLoader` by specifying `num_workers` some of the processes stay alive after the run and occupy GPU memory. For my...

I get a "DataLoader killed by signal" error when I set the dataloader `num_workers` to something other than 0, which disables multiprocessing. I'm running into bottlenecks due to this, is...

In the ray-lightning, there is no version number in the `__init__`. i.e., https://github.com/ray-project/ray_lightning/blob/main/ray_lightning/__init__.py However, in other packages, you can find this https://github.com/ray-project/ray/blob/master/python/ray/__init__.py#L114 ----- the benefit of this, user can find...