dalle-flow
dalle-flow copied to clipboard
Calling Upscale from Colab is throwing an error
I tried upscaling my images but they always stay at 250x250 and not at 1024x1024. Server side error:
ERROR upscaler/rep-0@42
FileNotFoundError('`../SwinIR/results/swinir_real_sr…
is not a URL or a valid local path')
FileNotFoundError:
`../SwinIR/results/swinir_real_sr_x4_large/ac3266c0f…
is not a URL or a valid local path
Hey spuliz, the error is clear, the path you are passing is not well seen by the Executor, I would maybe try sending the absolute path first of all
Thanks @JoanFM - how and where should I add the absolute path in the docker files? I feel like the error is coming from the SwinIRUpscaler class inside the swinir executor.py file. As I am using the provided Docker I don't think is scalable to modify the docker directly.
In the flow.yml
you can check this:
uses: SwinIRUpscaler
py_modules:
- executors/swinir/executor.py
uses_with:
swinir_path: ../SwinIR
Maybe u want to edit the swinir_path
argument to point to where the folder will be located inside the docker image
Hi @JoanFM I changed the swinir_path
both on the ec2 and inside the docker on the flow.yml
file using the absolute path to SwinIR.
However, I am still getting the same error:
FileNotFoundError:
`../SwinIR/results/swinir_real_sr_x4_large/0d51a37a4…
is not a URL or a valid local path
Hi @JoanFM are you able to repro the issue? More people are experiencing the same issue. Steps to repro:
- Install docker (following installation steps included in the readme of the repository)
- Run docker
- Open the Client provided on Google Colab and run all cells
- See that image does not get upscaled but error is thrown.
I did not have time to reproduce, but can u make sure by logging inside the container, that the path ../SwinIR
does exist?
can u show what the current error is now? it seems it is trying to chdir
but not possible, but what's exactly the error?
@JoanFM This is the root cause but I didn't find any fix yet.
An absolute path of /dalle/SwinIR
did not resolve the problem; it appears to be that the real issue is that the path itself is actually valid but absent- because, above that, main_test_swinir.py
applies the SwinIR model-zoo model and OOMs with RuntimeError: CUDA error: out of memory
. I've reproduced the issue by running via Docker Desktop on Windows 10 with an RTX 3090 (24GiB VRAM) on Cuda 11.6.0.
Upscaling does not appear to be viable without this path, and I have yet to find a workaround, though I'm considering stripping the other services temporarily.
Did you try building docker and run it via docker container? I just rebuild and run without any issue.
https://github.com/jina-ai/dalle-flow#run-in-docker
git clone https://github.com/jina-ai/dalle-flow.git
cd dalle-flow
docker build --build-arg GROUP_ID=$(id -g ${USER}) --build-arg USER_ID=$(id -u ${USER}) -t jinaai/dalle-flow .
docker run -p 51005:51005 -v $HOME/.cache:/home/dalle/.cache --gpus all jinaai/dalle-flow
That's the way I built it in the first place- yes. The problem is that it allocates more memory than is available even on a 24GiB GPU, despite claiming to run in 21.7GiB.
The configuration being in platform
memory mode would indicate that it should deallocate the memory for the other steps when it's done with them, but it seems to require enough of a baseline that the ~2 gigabytes used for upscaling is enough to pop it.
Additionally, by running it on two separate machines and disabling upscale on one, and everything else on the other, I was able to run it successfully, so I can confirm it's a matter of memory limits.
I too am running into this issue on a 3090 running the code via the Docker image, but my error doesnt look to be related to a normal OOM. My system is reporting 4GiB, at least, still free in the VRAM when this occurs.
RuntimeError: CUDA error: too many resources requested for launch
It can usually upscale one or two images just fine before throwing this error until it is restarted again.
Here is the full output I get with that :
@ryakr "CUDA error: Too many resources requested" implies that it is requesting too many threads per block. I'm not clear what may be different between our versions, but I am running a 3090 FTW3 rather than one based on the stock NVidia layout, so we may have actual hardware capability differences.
Worth noting: We're both hitting resource limits, just with different resources- and neither of them are handled by the runtime- which only encounters the issue when it realizes that the file at the expected path has not been created.
@Dessix Yeah I figured our issues were related in some way, just thought I would drop in what i'm seeing to hopefully help narrow it down some. I'm running a 3090 Founders Edition myself, so its is exactly the stock NVIDIA layout.
In my testing just now I did find a curious quirk that does point to it being overused resource related though. I ran upscale and diffuse images over 50 times without hitting the issue, but the second I run through the whole process of generating new images, diffuse, and then upscale, that is when I hit the resource limit.
So if you are running the docker yourself, a workaround to try would be generating images till you get something you want, restart the docker image, and then run diffuse and upscale. It might work for you too. Its a bit of a rough one but it works for now.
I got the same error in linux mint and i fixed it by running sudo apt install python-is-python3
I believe this issue has been resolved. Feel free to reopen if the problem occurs again.