editGAN_release icon indicating copy to clipboard operation
editGAN_release copied to clipboard

Failure getting EditGAN to run in Colab

Open dubtor opened this issue 4 years ago • 12 comments

Thanks for sharing your code! I tried getting this to load up in Google Colab. After some hassle and experimenting, I got it to the state where it's loading up at least.

If I try to click something like checking the box, or uploading a file, I see the following reports: image

image

The UI seems unresponsive. It is not perfectly clear whether that is due to a problem, or because I don't know how to use the software. Any hints?

Thank you 🙏

dubtor avatar Apr 08 '22 07:04 dubtor

Well, just seeing myself that it semed to have failed to load the demo_origin.js -- which is probably the reason it fails to see the JS functions. Will update this issue.

dubtor avatar Apr 08 '22 08:04 dubtor

Demo js is at static/demo_origin.js

Could you please also let me know how it's going on google Colab? Would like to help and update the Colab option in released code if possible

arieling avatar Apr 08 '22 19:04 arieling

Thank you @arieling - I have not yet gotten it to work properly, but at least solved the initial problem (which is why I am also adjust the title of this ticket).

I managed to create a colab environment using the mentioned package versions and added ngrok to run_app.py to create a tunnel from the localhost:8888 to a public URL. I had some trouble loading the local files from within the 'index.html' because they seemed to be a CORS issues, so the files 'demo_origin.js' and 'demo.css' would not load. I was able to work around this for the moment by inlining both the script and the CSS into the index.html itself.

I got the app loading up now and I can draw on the left and click the middle button (which I guess from the videos is the 'process' button). Once I click it, it looks like the app is running out of CUDA memory.

This is where I am currently at. I dont know if the memory is really full, or if maybe something else isn't working properly. I am running on Google Colab Pro+ with extended RAM.

The reported hardware is `Sun Apr 10 12:35:19 2022
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla P100-PCIE... Off | 00000000:00:04.0 Off | 0 | | N/A 35C P0 27W / 250W | 0MiB / 16280MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+`

The log report of the run_app.py is as follows: `ngrok: no process found Starting server... Server ready... Open URL in browser: NgrokTunnel: "http://e171-34-90-74-42.ngrok.io/" -> "http://localhost:8888/"

  • Serving Flask app 'run_app' (lazy loading)
  • Environment: production WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
  • Debug mode: off
  • Running on all addresses. WARNING: This is a development server. Do not use it in a production deployment.
  • Running on http://172.28.0.2:8888/ (Press CTRL+C to quit) Current working directory: /content Experiment folder created at: ./static/samples Experiment folder created at: ./static/results Experiment folder created at: ./static/upload_latents Load stylegan from, ./checkpoint/stylegan_pretrain/stylegan2_networks_stylegan2-car-config-f.pt at res, 512 make_mean_latent Load Classifier path, ./checkpoint/datasetgan_pretrain/classifier Setting up Perceptual loss... Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to /root/.cache/torch/checkpoints/vgg16-397923af.pth 100% 528M/528M [00:02<00:00, 247MB/s] Loading model from: /content/EditGAN-Robert/lpips/weights/v0.1/vgg.pth ...[net-lin [vgg]] initialized ...Done 0% 0/10 [00:00<?, ?it/s]/usr/local/lib/python3.8/site-packages/torch/nn/functional.py:2503: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. warnings.warn("Default upsampling behavior when mode={} is changed " 100% 10/10 [00:11<00:00, 1.15s/it] TOOL init!! 127.0.0.1 - - [10/Apr/2022 12:45:23] "GET / HTTP/1.1" 200 - 127.0.0.1 - - [10/Apr/2022 12:45:24] "GET /static/loading.gif HTTP/1.1" 404 - 127.0.0.1 - - [10/Apr/2022 12:45:24] "GET /static/images/car_real/0.jpg HTTP/1.1" 200 - 127.0.0.1 - - [10/Apr/2022 12:45:24] "GET /brush_circle.png HTTP/1.1" 404 - 127.0.0.1 - - [10/Apr/2022 12:45:24] "GET /brush_square.png HTTP/1.1" 404 - 127.0.0.1 - - [10/Apr/2022 12:45:24] "GET /brush_diamond.png HTTP/1.1" 404 - 127.0.0.1 - - [10/Apr/2022 12:45:24] "GET /paint-brush.png HTTP/1.1" 404 - 127.0.0.1 - - [10/Apr/2022 12:45:24] "GET /paint-can.png HTTP/1.1" 404 - 127.0.0.1 - - [10/Apr/2022 12:45:24] "GET /eyedropper.png HTTP/1.1" 404 - 127.0.0.1 - - [10/Apr/2022 12:45:24] "GET /undo.png HTTP/1.1" 404 - 127.0.0.1 - - [10/Apr/2022 12:45:24] "GET /save.png HTTP/1.1" 404 - 127.0.0.1 - - [10/Apr/2022 12:45:24] "GET /run.png HTTP/1.1" 404 - 127.0.0.1 - - [10/Apr/2022 12:45:25] "GET /random.png HTTP/1.1" 404 - 127.0.0.1 - - [10/Apr/2022 12:45:25] "GET /images/car_real/0.jpg HTTP/1.1" 404 - 127.0.0.1 - - [10/Apr/2022 12:45:25] "GET /images/car_real/1.jpg HTTP/1.1" 404 - 127.0.0.1 - - [10/Apr/2022 12:45:25] "GET /images/car_real/2.jpg HTTP/1.1" 404 - 127.0.0.1 - - [10/Apr/2022 12:45:25] "GET /images/car_real/3.jpg HTTP/1.1" 404 - 127.0.0.1 - - [10/Apr/2022 12:45:25] "GET /images/car_real/4.jpg HTTP/1.1" 404 - 127.0.0.1 - - [10/Apr/2022 12:45:25] "GET /images/car_real/5.jpg HTTP/1.1" 404 - 127.0.0.1 - - [10/Apr/2022 12:45:25] "GET /images/car_real/6.jpg HTTP/1.1" 404 - 127.0.0.1 - - [10/Apr/2022 12:45:25] "GET /images/car_real/7.jpg HTTP/1.1" 404 - 127.0.0.1 - - [10/Apr/2022 12:45:25] "GET /images/car_real/8.jpg HTTP/1.1" 404 - 127.0.0.1 - - [10/Apr/2022 12:45:25] "GET /images/car_real/9.jpg HTTP/1.1" 404 - 127.0.0.1 - - [10/Apr/2022 12:45:25] "GET /images/car_real/10.jpg HTTP/1.1" 404 - 127.0.0.1 - - [10/Apr/2022 12:45:25] "GET /info.png HTTP/1.1" 404 - 127.0.0.1 - - [10/Apr/2022 12:45:26] "GET /static/images/car_real/colorize_mask/0.png HTTP/1.1" 200 - 127.0.0.1 - - [10/Apr/2022 12:45:26] "GET /favicon.ico HTTP/1.1" 404 - 127.0.0.1 - - [10/Apr/2022 12:45:57] "GET /undo.png HTTP/1.1" 404 - Current image id: 0 0% 0/29 [00:00<?, ?it/s]/opt/conda/conda-bld/pytorch_1579022027550/work/aten/src/ATen/native/IndexingUtils.h:20: UserWarning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead. Warning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead. (expandTensors at /opt/conda/conda-bld/pytorch_1579022027550/work/aten/src/ATen/native/IndexingUtils.h:20) 0% 0/29 [00:00<?, ?it/s] [2022-04-10 12:46:10,107] ERROR in app: Exception on /api/edit_from_mask [POST] Traceback (most recent call last): File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 2073, in wsgi_app response = self.full_dispatch_request() File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1518, in full_dispatch_request rv = self.handle_user_exception(e) File "/usr/local/lib/python3.8/site-packages/flask_cors/extension.py", line 165, in wrapped_function return cors_after_request(app.make_response(f(*args, **kwargs))) File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1516, in full_dispatch_request rv = self.dispatch_request() File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1502, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args) File "/usr/local/lib/python3.8/site-packages/flask_cors/decorator.py", line 128, in wrapped_function resp = make_response(f(*args, **kwargs)) File "/content/EditGAN-Robert/run_app.py", line 138, in edit_from_mask img_out, img_seg_final, optimized_latent = tool.run_optimization_editGAN(seg_mask, curr_latent, roi) File "/content/EditGAN-Robert/models/EditGAN/EditGAN_tool.py", line 378, in run_optimization_editGAN loss.backward() File "/usr/local/lib/python3.8/site-packages/torch/tensor.py", line 195, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/usr/local/lib/python3.8/site-packages/torch/autograd/init.py", line 97, in backward Variable._execution_engine.run_backward( RuntimeError: CUDA out of memory. Tried to allocate 5.88 GiB (GPU 0; 15.90 GiB total capacity; 14.24 GiB already allocated; 231.75 MiB free; 15.03 GiB reserved in total by PyTorch) 127.0.0.1 - - [10/Apr/2022 12:46:10] "POST /api/edit_from_mask HTTP/1.1" 500 -`

dubtor avatar Apr 10 '22 12:04 dubtor

@arieling I can invite you to the Colab if you like, even though some of the settings will be fixed to my system. Feel free to reach out via Telegram @dubtor

dubtor avatar Apr 10 '22 13:04 dubtor

@dubtor I managed to run your fork on colab using this notebook https://colab.research.google.com/drive/14nY3p9GG-yfzMziySVqs2zZZk5ArXFiY?usp=sharing

udibr avatar Apr 30 '22 16:04 udibr

Thanks @udibr for sharing! Does this colab run the full demo for you? I tried to run yours, and in my case, it is still running out of CUDA memory, same like my own previous tests with an own Colab notebook. My own version run until I clicked the "process" button on the web UI. Yours was running OOM already during the bootup of the web app. I was using the GPU version on Colab+. Have you done anything differently? Thank you!

dubtor avatar Apr 30 '22 18:04 dubtor

I occasionally do get OOM but most of the time not. It could be its because I'm using "Clab Pro" which gives you a better priority on the GPU card being used.

I did manage to modify the tire and headlights of the car image which was fun, but I have no idea how to use the rest of the features of this App

udibr avatar Apr 30 '22 18:04 udibr

I just tried again and indeed got OOM I then did "Runtime->Disconnect and delete run time" and re run the notbook and it works

udibr avatar Apr 30 '22 18:04 udibr

looks like adding the following code at the very top of run_app.py helps:

import os
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = "max_split_size_mb:1000"
import torch
torch.cuda.empty_cache()

udibr avatar Apr 30 '22 19:04 udibr

Maybe you want to make the editing region smaller to test first. Once you can deploy the model, the memory depends on your editing region area.

arieling avatar May 02 '22 02:05 arieling

looks like adding the following code at the very top of run_app.py helps:

import os
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = "max_split_size_mb:1000"
import torch
torch.cuda.empty_cache()

I've tried that fix and still getting the out of memory error on a P100 Restarting the runtime also didn't help.

One difference from the previous descriptions of the problem is that I can't get the url to open anything.

wandrzej avatar May 04 '22 15:05 wandrzej

looks like adding the following code at the very top of run_app.py helps:

import os
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = "max_split_size_mb:1000"
import torch
torch.cuda.empty_cache()

I've tried that fix and still getting the out of memory error on a P100 Restarting the runtime also didn't help.

One difference from the previous descriptions of the problem is that I can't get the url to open anything.

I met the same problem as yours. U may like to check your projects follow the link below https://blog.csdn.net/qq_38677322/article/details/109696077

Ley-lele avatar Jul 14 '22 06:07 Ley-lele