fast-neural-style icon indicating copy to clipboard operation
fast-neural-style copied to clipboard

Reinstalled System and Now Get Unexpected Results

Open 3DTOPO opened this issue 7 years ago • 13 comments

I had to rebuild my Ubuntu system after my NVIDIA driver was broken from the kernel update that was automatically installed for the meltdown/spectre vulnerabilities.

From a clean system of 16.04.4, I installed CUDA 8.0, Torch, CuDNN 5.1 and the other requirements. All went smoothly.

But now when I run fast_neural_style.lua with any model previously trained, or any of the sample trained models, the results are quite different than what is expected. For instance, this image was created from the candy.t7_ model:

candy-640-cosmo

I get the same results with or without enabling the GPU, and I get the same results on my Mac OS X machine I just installed all the required software on (but without GPU acceleration).

My best guess is something must have changed with the latest Torch7?

3DTOPO avatar Mar 15 '18 08:03 3DTOPO

I am attempting to trouble shoot the issue the best I can, and interestingly, I get the identical (unexpected) results if I comment out line 54 of fast_neural_style.lua

If I change line 54 from: model:evaluate()

To: -- model:evaluate()

So it seems like the critical evaluate() function is not doing anything now. Any suggestions?

3DTOPO avatar Mar 15 '18 22:03 3DTOPO

I just tried manually importing candy.t7 and running evaluate() gives me an error message that evaluate is a nil value. If I run evaluate() on a test net created in the torch session, I get no errors:

cd fast-neural-style th

th> require 'nn' th> require 'fast_neural_style.ShaveImage' th> require 'fast_neural_style.TotalVariation' th> require 'fast_neural_style.InstanceNormalization' th> model=torch.load("models/candy.t7") th> model:evaluate() [string "_RESULT={model:evaluate()}"]:1: attempt to call method 'evaluate' (a nil value) stack traceback: [string "_RESULT={model:evaluate()}"]:1: in main chunk [C]: in function 'xpcall' /home/jeshua/torch/install/share/lua/5.1/trepl/init.lua:661: in function 'repl' ...shua/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:199: in main chunk [C]: at 0x00405d50

th> net = nn.Sequential() th> net:evaluate()

No Errors

3DTOPO avatar Mar 15 '18 23:03 3DTOPO

I tried using an AWS image configured for Torch and everything worked as expected. So I archived the torch directory and copied to my machine then ran the install script, and it works as expected on my machine now.

It still gets the same error as my post directly above however, so that was a red herring for me.

This leads me to believe that the current repository of Torch is not compatible with fast-neural-style. Its a shame that they don't have versioning for Torch so one could install a version known to be compatible.

3DTOPO avatar Mar 16 '18 07:03 3DTOPO

Let me guess: do you run into trouble only with instance normalization enabled? So do I...

flaushi avatar Mar 20 '18 18:03 flaushi

Quite possibly; all my models have instance normalization enabled, so I didn't even try without it.

3DTOPO avatar Mar 20 '18 19:03 3DTOPO

There are other people encountering this problem, too, see here #137 I did follow all mentioned workarounds there, but nothing really helped.

flaushi avatar Mar 21 '18 08:03 flaushi

Thanks, I had not seen that. I can't run CUDA 7.5 because my GPU is not supported before 8.0. But I might try the update script mentioned. Sorry I can't offer any suggestions other than installing an older version of Torch (worked for me).

3DTOPO avatar Mar 21 '18 08:03 3DTOPO

Hmm, ok, can you give me the id of the last git commit of torch, I would try to check that out then. You can print that git log

flaushi avatar Mar 21 '18 09:03 flaushi

I would if I knew it. I found an AWS image configured for Torch and archived and copied it to my machine, then ran the install script.

3DTOPO avatar Mar 21 '18 09:03 3DTOPO

git log then you will see a list of commit messages, the topmost is the one I would need

flaushi avatar Mar 21 '18 09:03 flaushi

Nice, thanks, I didn't know that!

Most recent:

commit c9b29cf41ec714ee45b4799c2bd76e82d1b1f267 Author: soumith <[email protected]> Date: Wed Dec 21 07:25:57 2016 -0800

3DTOPO avatar Mar 21 '18 09:03 3DTOPO

This looks like color saturation problem. One step of the network is to subtract different constant values from each of the RGB channel. Somehow, it also switchs the order of the RGB channel to BGR. If the new Ubuntu does not need this, then it will mess up the color. To confirm this, you can switch the RGB channel of the input image to BGR with an image editing tool, such as Python or Matlab. Then use the new image as input of the network.

ArtlyStyles avatar Apr 19 '18 04:04 ArtlyStyles

Apparently it has something to do the latest torch normalization. Its definitely not a saturation or swapped channel problem.

3DTOPO avatar Apr 19 '18 04:04 3DTOPO