LightningWork does not move to the GPU
First check
- [x] I'm sure this is a bug.
- [X] I've added a descriptive title to this bug.
- [X] I've provided clear instructions on how to reproduce the bug.
- [X] I've added a code sample.
- [X] I've provided any other important info that is required.
Bug description
I am trying to run an app with LightningWork of type ServeGradio which should run on my local GPU.
I am passing the LightningWork L.CloudCompute("gpu") (also tried "cuda") but It does not seem to move my model to the the GPU. When I am trying to move my model explicitly to the GPU I am getting the following error: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method.
when I am not trying to move my model to the GPU the process runs.
How to reproduce the bug
class VideoServeGradio(ServeGradio):
inputs = gr.Video()
outputs = "playable_video"
def __init__(self, cloud_compute, *args, **kwargs):
super().__init__(*args, cloud_compute=cloud_compute, **kwargs)
print("cuda", torch.cuda.is_available()) # this prints True
def run(self):
super().run()
def predict(self, video):
self.model(video)
inferred_video_path = "./artifacts/out_vids/nonamevid.mp4" #this is the local path where the video is saved
return inferred_video_path
def build_model(self):
print("cuda:", torch.cuda.is_available()) # this prints True as well
pipe = MyPipeline(face_geometry_path=None)
pipe.to("cuda") # this results in an error
return pipe
class Flow(L.LightningFlow):
def __init__(self):
super().__init__()
print("cuda:::::", torch.cuda.is_available())
self.serve_work = VideoServeGradio(cloud_compute=L.CloudCompute("gpu"))
def run(self):
self.serve_work.run()
def configure_layout(self):
tab_2 = {"name": "Interactive demo", "content": self.serve_work}
return [tab_2]
app = L.LightningApp(Flow(), debug=True)
Error messages and logs
# Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
Important info
#- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow):
#- Lightning App Version (e.g., 0.5.2):
#- PyTorch Version (e.g., 1.10):
#- Python version (e.g., 3.9):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
#- Running environment of LightningApp (e.g. local, cloud):
More info
No response
Hey @yuvals1.
Thanks for trying Lightning App.
Here are some explications.
- CloudCompute is meant to specify which machine you want your work to be run upon in the cloud. Therefore, it doesn't have any impact locally.
- When you use
torch.cuda.is_available(), this creates a cuda context and can't be passed into forked process. Remove all your prints ;)
Could you try this:
class VideoServeGradio(ServeGradio):
inputs = gr.Video()
outputs = "playable_video"
def run(self):
super().run()
def predict(self, video):
# Move the model to cuda in the predict method.
model = self.model.cuda()
video = video.cuda()
output = model(video)
return output.cpu().item()
def build_model(self):
return MyPipeline(face_geometry_path=None)
class Flow(L.LightningFlow):
def __init__(self):
super().__init__()
self.serve_work = VideoServeGradio()
def run(self):
self.serve_work.run()
def configure_layout(self):
tab_2 = {"name": "Interactive demo", "content": self.serve_work}
return [tab_2]
app = L.LightningApp(Flow(), debug=True)
Hey @tchaton,
Thanks for the response.
So I tried your suggestion and unfortunately now the process can't find Cuda for some reason:
CUDA driver initialization failed, you might not have a CUDA gpu.
Any ideas why?
Hey @yuvals1. Some progress, different errors.
Mind trying this?
def predict(self, video):
# Move the model to cuda in the predict method.
torch.set_device(torch.device('cuda:0'))
model = self.model.cuda()
video = video.cuda()
output = model(video)
return output.cpu().item()
cc @awaelchli
Hi @yuvals1 Is PyTorch working fine on that system otherwise? Please check that this works:
python -c "import torch; torch.rand(2).to('cuda:0')"
Because the error
CUDA driver initialization failed, you might not have a CUDA gpu.
would suggest that your system/display driver is perhaps outdated?
As @tchaton said, the error
Cannot re-initialize CUDA in forked subprocess.
Is from torch and it seems that gradio.Interface().launch() that we use under the hood uses the forking method to create a subprocess. This is a limitation with torch, and thus all cuda operations should be performed inside that predict function. Hmm, I'm not sure what we could do here.
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions - the Lightning Team!