dalle-flow icon indicating copy to clipboard operation
dalle-flow copied to clipboard

Calling Upscale from Colab is throwing an error

Open spuliz opened this issue 2 years ago • 14 comments

I tried upscaling my images but they always stay at 250x250 and not at 1024x1024. Server side error:

ERROR  upscaler/rep-0@42                                 
       FileNotFoundError('`../SwinIR/results/swinir_real_sr…                    
       is not a URL or a valid local path')  
    FileNotFoundError:                                                       
       `../SwinIR/results/swinir_real_sr_x4_large/ac3266c0f…                    
       is not a URL or a valid local path   

spuliz avatar May 28 '22 09:05 spuliz

Hey spuliz, the error is clear, the path you are passing is not well seen by the Executor, I would maybe try sending the absolute path first of all

JoanFM avatar May 28 '22 09:05 JoanFM

Thanks @JoanFM - how and where should I add the absolute path in the docker files? I feel like the error is coming from the SwinIRUpscaler class inside the swinir executor.py file. As I am using the provided Docker I don't think is scalable to modify the docker directly.

spuliz avatar May 28 '22 10:05 spuliz

In the flow.yml you can check this:

    uses: SwinIRUpscaler
    py_modules:
      - executors/swinir/executor.py
    uses_with:
      swinir_path: ../SwinIR

Maybe u want to edit the swinir_path argument to point to where the folder will be located inside the docker image

JoanFM avatar May 30 '22 07:05 JoanFM

Hi @JoanFM I changed the swinir_path both on the ec2 and inside the docker on the flow.yml file using the absolute path to SwinIR.
However, I am still getting the same error:

   FileNotFoundError:                                                       
       `../SwinIR/results/swinir_real_sr_x4_large/0d51a37a4…                    
       is not a URL or a valid local path 

spuliz avatar May 30 '22 09:05 spuliz

Hi @JoanFM are you able to repro the issue? More people are experiencing the same issue. Steps to repro:

  1. Install docker (following installation steps included in the readme of the repository)
  2. Run docker
  3. Open the Client provided on Google Colab and run all cells
  4. See that image does not get upscaled but error is thrown.

spuliz avatar May 31 '22 11:05 spuliz

I did not have time to reproduce, but can u make sure by logging inside the container, that the path ../SwinIR does exist?

JoanFM avatar May 31 '22 11:05 JoanFM

can u show what the current error is now? it seems it is trying to chdir but not possible, but what's exactly the error?

JoanFM avatar May 31 '22 13:05 JoanFM

@JoanFM This is the root cause but I didn't find any fix yet. image

spuliz avatar May 31 '22 17:05 spuliz

An absolute path of /dalle/SwinIR did not resolve the problem; it appears to be that the real issue is that the path itself is actually valid but absent- because, above that, main_test_swinir.py applies the SwinIR model-zoo model and OOMs with RuntimeError: CUDA error: out of memory. I've reproduced the issue by running via Docker Desktop on Windows 10 with an RTX 3090 (24GiB VRAM) on Cuda 11.6.0.

Upscaling does not appear to be viable without this path, and I have yet to find a workaround, though I'm considering stripping the other services temporarily.

Dessix avatar Jun 08 '22 08:06 Dessix

Did you try building docker and run it via docker container? I just rebuild and run without any issue.

https://github.com/jina-ai/dalle-flow#run-in-docker

git clone https://github.com/jina-ai/dalle-flow.git
cd dalle-flow

docker build --build-arg GROUP_ID=$(id -g ${USER}) --build-arg USER_ID=$(id -u ${USER}) -t jinaai/dalle-flow .

docker run -p 51005:51005 -v $HOME/.cache:/home/dalle/.cache --gpus all jinaai/dalle-flow

hanxiao avatar Jun 11 '22 19:06 hanxiao

That's the way I built it in the first place- yes. The problem is that it allocates more memory than is available even on a 24GiB GPU, despite claiming to run in 21.7GiB.

The configuration being in platform memory mode would indicate that it should deallocate the memory for the other steps when it's done with them, but it seems to require enough of a baseline that the ~2 gigabytes used for upscaling is enough to pop it.

Additionally, by running it on two separate machines and disabling upscale on one, and everything else on the other, I was able to run it successfully, so I can confirm it's a matter of memory limits.

Dessix avatar Jun 12 '22 00:06 Dessix

I too am running into this issue on a 3090 running the code via the Docker image, but my error doesnt look to be related to a normal OOM. My system is reporting 4GiB, at least, still free in the VRAM when this occurs.

RuntimeError: CUDA error: too many resources requested for launch

It can usually upscale one or two images just fine before throwing this error until it is restarted again. Here is the full output I get with that : image

ryakr avatar Jun 13 '22 03:06 ryakr

@ryakr "CUDA error: Too many resources requested" implies that it is requesting too many threads per block. I'm not clear what may be different between our versions, but I am running a 3090 FTW3 rather than one based on the stock NVidia layout, so we may have actual hardware capability differences.

Worth noting: We're both hitting resource limits, just with different resources- and neither of them are handled by the runtime- which only encounters the issue when it realizes that the file at the expected path has not been created.

Dessix avatar Jun 13 '22 03:06 Dessix

@Dessix Yeah I figured our issues were related in some way, just thought I would drop in what i'm seeing to hopefully help narrow it down some. I'm running a 3090 Founders Edition myself, so its is exactly the stock NVIDIA layout.

In my testing just now I did find a curious quirk that does point to it being overused resource related though. I ran upscale and diffuse images over 50 times without hitting the issue, but the second I run through the whole process of generating new images, diffuse, and then upscale, that is when I hit the resource limit.

So if you are running the docker yourself, a workaround to try would be generating images till you get something you want, restart the docker image, and then run diffuse and upscale. It might work for you too. Its a bit of a rough one but it works for now.

ryakr avatar Jun 13 '22 03:06 ryakr

I got the same error in linux mint and i fixed it by running sudo apt install python-is-python3

Zaithe avatar Sep 09 '22 00:09 Zaithe

I believe this issue has been resolved. Feel free to reopen if the problem occurs again.

delgermurun avatar Oct 07 '22 13:10 delgermurun