DeepDeblur-PyTorch
DeepDeblur-PyTorch copied to clipboard
Running Demo (MultiSaver) on Windows
I spent many hours trying to get this to work under Windows. I managed to get it to work now, so this is probably useful to others.
Setup
The first obstacle is the readline
Python package, which seems to be default on Unix systems, but not on Windows. For this, simply install the pyreadline
package, which is a Windows port of readline.
Understanding the command-line
Example command: python main.py --save_dir REDS_L1 --demo_input_dir d:/datasets/motion47set/noise_only --demo_output_dir ../results/motion47set
.
Explanation: specifying --demo_input_dir
(or --demo true
) will run an evaluation, using a pretrained model as specified in --save_dir
. Every image of my motion47set will be evaluated. The results will be saved alongside the folders src
and experiments
at the project root, in a folder results/motion47set
.
Note that even getting this far is not very intuitive, as others have already pointed out. Usually there is a separate python script for just evaluation/testing/inference. Next, the term demo is a bit unusual, at first I was expecting some interactive demonstration of some form. The save_dir
I had at first used as what demo_output_dir
does.
Another word of caution, if the output path is given without any .
, it somehow ends up saving the results at d:/results/motion47set
, which again took me a while to figure out, i.e. on the root of the same drive that the project is located at. I suggest printing out the absolute output dir with os.path.abspath
to the user at some point, for clarity.
Bug
Running the above command will produce the following output:
===> Loading demo dataset: Demo
Loading model from ../experiment\REDS_L1\models\model-200.pt
Loading optimizer from ../experiment\REDS_L1\optim\optim-200.pt
Loss function: 1*L1
Metrics: PSNR,SSIM
Loading loss record from ../experiment\REDS_L1\loss.pt
===> Initializing trainer
results are saved in ../results/motion47set
| | 0/90 [00:00<?, ?it/s]Can't pickle local object 'MultiSaver.begin_background.<locals>.t'
|██▏ | 4/90 [00:06<02:14, 1.56s/it]Traceback (most recent call last):
File "<string>", line 1, in <module>
File "d:\Program Files\Anaconda3\envs\torch gpu\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "d:\Program Files\Anaconda3\envs\torch gpu\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
|█████▋ | 11/90 [00:10<01:12, 1.09it/s]forrtl: error (200): program aborting due to control-C event
Also note that ctrl+c takes a really long time to terminate for me, and even slows down my entire machine for several seconds.
This is difficult to debug, because there is no fatal exception, and everything seems to run normally, ignoring the errors, which might also just be warnings, for all we know. I did not realize for a while that MultiSaver is a file of this project, which is why there is not much help online in regards to this error/warning. Second, the only that that gives a little stronger hint that this is an error, and not a warning, is the EOFError
, which I still don't know why or where it even happens. A large part of debugging time was me assuming these were just warnings, and trying to fix the command-line arguments instead, since that is easy to get wrong.
What is actually happening is that the MultiSaver code runs clean on the main thread, but then each spawned thread/process will fail, without the main thread being aware. As a result, the program runs through, attempts to save the output images, which all do nothing since the threads/processes already died. I'm not sure how to to achieve this, but it would be nice if the program stops running when it is unable to save output images (at least in demo mode, where that's about the only purpose).
The keywords to locate the actual issue here are pickle
and multiprocessing
. Going into utils.py
and looking at the class MultiSaver
shows us a method begin_background
, with a method-local variable t
(another method). Defining that method works, however (under Windows) that variable has to be pickled/serialized to hand it over to the mp.Process
, which will run it in a different thread/process. This fails because pickle does not support local objects.
I tried various ways to change the scope of t
:
- put
global t
before the definition oft
(no change) - move
t
to the outermost scope of the file utils.py, i.e. same level as MutliSaver (can pickle the method, but later fails at a different point) - the solution that works is putting
t
on the same scope as MultiSaver, and annotating it with@staticmethod
. The annotation avoids the first method parameter to be used asself
.
So my modification looks like this
class MultiSaver():
...
@staticmethod
def t(queue):
...
def begin_background(self):
self.queue = mp.Queue()
worker = lambda: mp.Process(target=MultiSaver.t, args=(self.queue,), daemon=False)
...
...
After this change, everything works as expected. I haven't tested it, but I suspect this will still work under Unix as well.
I'm not sure if this will work if multiple instances of MultiSaver
are created, and maybe this would give the same result as putting t
to the outermost scope, i.e. fail again.
Thanks for your comment!
May I ask you a question? I'm wondering if I can use the trained result like YOLO, e.g.
import python.darknet as dn
dn.set_gpu(0) net = dn.load_net(str.encode("cfg/tiny-yolo.cfg"), str.encode("weights/tiny-yolo.weights"), 0) meta = dn.load_meta(str.encode("cfg/coco.data")) r = dn.detect(net, meta, str.encode("data/dog.jpg")) print(r)
@mj9 Thanks for the hard work and time spent. your solution works like a charm on my windows.
Hi. I have been trying to run the demo on Windows following your guideline and when I run the program I get this error:
===> Loading demo dataset: Demo
Loss function: 1*L1
Metrics: PSNR,SSIM
===> Initializing trainer
results are saved in ../results
| | 0/1000 [00:00<?, ?it/s]name 't' is not defined
|█▉ | 42/1000 [01:31<34:40, 2.17s/it]
The time increases in the console but I am not getting any results in the output folder, it only creates empty folders. It seems that it is related to the t() function in the MultiSaver class. It is not working anymore following your guideline. Can you look into this? Thank you
The time increases in the console but I am not getting any results in the output folder, it only creates empty folders. It seems that it is related to the t() function in the MultiSaver class. It is not working anymore following your guideline. Can you look into this? Thank you
I don't think the code in this repository changed much, so I'm guessing my approach still works. Are you sure you implemented it correctly? You could post the code of your MultiSaver
implementation
@mj9 Hi, Can you share your utils.py with us? Send the contents of the document directly to the forum. I've been bothering for days about this problem that running on the window10.Thank you. My code just like that
class MultiSaver():
def __init__(self, result_dir=None):
self.queue = None
self.process = None
self.result_dir = result_dir
def begin_background(self):
self.queue = mp.Queue()
@staticmethod
def t(queue):
while True:
if queue.empty():
continue
img, name = queue.get()
if name:
try:
basename, ext = os.path.splitext(name)
if ext != '.png':
name = '{}.png'.format(basename)
imageio.imwrite(name, img)
except Exception as e:
print(e)
else:
return
worker = lambda: mp.Process(target=MultiSaver.t, args=(self.queue,), daemon=False)
cpu_count = min(8, mp.cpu_count() - 1)
self.process = [worker() for _ in range(cpu_count)]
for p in self.process:
p.start()
def end_background(self):
if self.queue is None:
return
for _ in self.process:
self.queue.put((None, None))
def join_background(self):
if self.queue is None:
return
while not self.queue.empty():
time.sleep(0.5)
for p in self.process:
p.join()
self.queue = None
def save_image(self, output, save_names, result_dir=None):
result_dir = result_dir if self.result_dir is None else self.result_dir
if result_dir is None:
raise Exception('no result dir specified!')
if self.queue is None:
try:
self.begin_background()
except Exception as e:
print(e)
return
# assume NCHW format
if output.ndim == 2:
output = output.expand([1, 1] + list(output.shape))
elif output.ndim == 3:
output = output.expand([1] + list(output.shape))
for output_img, save_name in zip(output, save_names):
# assume image range [0, 255]
output_img = output_img.add_(0.5).clamp_(0, 255).permute(1, 2, 0).to('cpu', torch.uint8).numpy()
save_name = os.path.join(result_dir, save_name)
save_dir = os.path.dirname(save_name)
os.makedirs(save_dir, exist_ok=True)
self.queue.put((output_img, save_name))
return
And my running text is
@mj9 Hi, Can you share your utils.py with us?
I don't have access to the code currently, but note how your method t is inside another function. You have to put it directly under MultiSaver