ray_lightning icon indicating copy to clipboard operation
ray_lightning copied to clipboard

Pytorch Lightning Distributed Accelerators using Ray

Results 66 ray_lightning issues
Sort by recently updated
recently updated
newest added

Hi, I am trying to use ray-lightning with Tune in a distributed environment but I am not sure on how to define the `--num-workers`. Could you please help me to...

I've been trying to run a distributed setup with Ray Lightning however it seems the worker nodes aren't downloading the data via my data module's prepare_data() function. I have set...

Hey @amogkam, PyTorch Lightning v1.6 coming mid of March would make the Accelerator API stable. PyTorch Lightning introduced [Launchers](https://github.com/PyTorchLightning/pytorch-lightning/tree/master/pytorch_lightning/strategies/launchers) for Distributed Plugin and I believe this would be a more...

The [GPUStatsMonitor ](https://pytorch-lightning.readthedocs.io/en/stable/extensions/generated/pytorch_lightning.callbacks.GPUStatsMonitor.html#pytorch_lightning.callbacks.GPUStatsMonitor) Callback records information about the GPU utilization in Tensorboard logs, however when running with ray_lightning, it raises a MisconfigurationException: ``` pytorch_lightning.utilities.exceptions.MisconfigurationException: You are using GPUStatsMonitor but are...

I have set-up the Ray GPU cluster (g4dn.12xlarge workers) with `ray up config.yaml` and have installed all package dependencies within ray config. However when I attempt distributed training I `MisconfigurationException:...

Closes #99. GPU tests were run manually and all are passing. In PTL 16 bit precision is only works on GPU. You specify that you want GPUs to your Trainer...

Hey all, thanks for using the Ray Lightning library so far! In order to keep this library up to date and continue to build new features in a timely manner,...

Hi, thank you for the great integration of Lightning & Ray! I found that using 16 bit precision returns the following error: `pytorch_lightning.utilities.exceptions.MisconfigurationException: You have asked for native AMP on...

Bumps [torch](https://github.com/pytorch/pytorch) from 1.8.1 to 1.9.0. Release notes Sourced from torch's releases. PyTorch 1.9 Release, including Torch.Linalg and Mobile Interpreter PyTorch 1.9 Release Notes Highlights Backwards Incompatible Change Deprecations New...

dependencies

I've noticed that while running on my SLURM cluster, if num_workers is set to too many (as far as I can tell this is an arbitrary amount) the job starts...