neural-style-pt icon indicating copy to clipboard operation
neural-style-pt copied to clipboard

Fork implementing multi-region spatial control

Open genekogan opened this issue 6 years ago • 34 comments

First of all, this is a great repo! It seems a bit faster and more memory efficient than the original lua-based neural-style.

I've made a fork of this repo trying to add masked style transfer as described by Gatys, and going off of the gist you wrote for the lua version.

I've almost got it working, but my implementation is suffering from two bugs. The first is that, testing with two style images and segmentations, my implementation seems only to get gradients for the first mask but not the second.

So for example, the following command:

python neural_style.py -backend cudnn -style_image examples/inputs/cubist.jpg,examples/inputs/starry_night.jpg -style_seg examples/segments/cubist.png,examples/segments/starry_night.png -content_seg examples/segments/monalisa.png -color_codes white,black

produces the following output:

out1

where the first style (cubist) and corresponding segmentation get good gradients and works in the mask provided, but the second mask (starry night) has little or no gradient signal.

By simply swapping the order of the style images, as in:

python neural_style.py -backend cudnn -style_image examples/inputs/starry_night.jpg,examples/inputs/cubist.jpg -style_seg examples/segments/starry_night.png,examples/segments/cubist.png -content_seg examples/segments/monalisa.png -color_codes white,black

I get the opposite effect where only the starry night style works and the cubist style in its mask is not there.

out2

I have been trying to debug this, checking the masks, and everything seems right to me, and I can't figure out the problem. This was almost a pytorch mirror of what you made in your gist, which does appear to work fine. I'm not sure if there's some typo I'm missing or something deeper.

Additionally, loss.backward() without keeping the gradients with retain_graph=True produces a runtime error (RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.), which makes me think I setup the graph wrong.

If you are able to see what I'm doing wrong so that we can fix it, I'd love to see this implemented in PyTorch. I think it would be a really nice addition to the repo.

genekogan avatar Oct 31 '19 05:10 genekogan

Nice work on translating my old code!

I think that loss.backward() shouldn't require retain_graph=True, as I don't think that we need to save the intermediate values. The intermediate values will also waste GPU resources if we save them. I'm not sure what is causing the other issue with only the first style image working.

ProGamerGov avatar Nov 01 '19 14:11 ProGamerGov

Yeah, I agree, I think there might be an issue with how I set up the MaskedStyleLoss layer, maybe the graph gets detached somewhere. Perhaps that's also related to why the second style doesn't get picked up. The code is almost identical to yours, just added the diffs that you added to the old lua code in your gist.

genekogan avatar Nov 01 '19 16:11 genekogan

PyTorch has a few differences with Torch7, due to the Autograd feature, and because of Python.

I suspect that the issue lies with something in the MaskedStyleLoss function. This example may help figure out what is causing the issue: https://discuss.pytorch.org/t/runtimeerror-trying-to-backward-through-the-graph-a-second-time-but-the-buffers-have-already-been-freed-specify-retain-graph-true-when-calling-backward-the-first-time/6795/29

ProGamerGov avatar Nov 01 '19 17:11 ProGamerGov

Ok, I fixed the problem with the graph, just had to detach the masked gram before operating with it.

But the problem with the second style not having any effect persists. One clue is that there seems to be a large magnitude difference between the two MaskedStyleLoss for each of the two styles. Will keep investigating.

genekogan avatar Nov 01 '19 19:11 genekogan

I've fixed the bug with the second style and now everything works properly! See result.

out

Need to add a little bit of documentation to the README, and I can also send a PR to you if you'd like here. To my eye, I think it still needs a bit of work. In the paper the authors gave some nuances to the generation of masks besides for simple bilinear scaling. I am also trying to figure out how to make non-discrete continuous masks to do transitioning between styles but finding this isn't as straightforward as I thought it would be!

genekogan avatar Nov 04 '19 15:11 genekogan

@genekogan Looks good! I'm not sure about the licensing issue with translated code with respect to the segmentation code from Lua. That could conflict with neural-style-pt license, and as such that could mean that it's better to list it in the wiki, like how the original was linked to in the neural-style wiki.

I also wonder if we can simplify the code and improve how it looks/works? Python is a lot more powerful than Lua, and opens up possibilities for improving the code.

ProGamerGov avatar Nov 06 '19 03:11 ProGamerGov

Sure, I am fine with listing it on the wiki instead.

Yeah, I'd definitely like to improve the code. One thing I'm currently struggling with is blending or transitioning between masks by making them continuous instead of discrete. I've implemented this in a separate branch but it produces poor results in the boundary areas. I wrote about this in more detail in this issue. I'd be curious if you have any ideas.

genekogan avatar Nov 06 '19 23:11 genekogan

@genekogan Are making sure that that TV weight is set to 0 in your experiments?

ProGamerGov avatar Dec 01 '19 17:12 ProGamerGov

Yes, setting tv_weight to 0 has not really helped. I also just started a new branch which replaces gram loss with @pierre-wilmot's histogram loss as described here. I'm getting interesting results with it, but the big gap in the middle remains. I'm pretty stumped. I might start trying more hacky approaches.

genekogan avatar Dec 03 '19 05:12 genekogan

@genekogan I was actually recently looking into histogram loss myself, after seeing the results from: https://arxiv.org/pdf/1701.08893.pdf. It was used in deep-painterly-harmonization, and it seems like a better idea than performing histogram matching before/after style transfer. deep-painterly-harmonization seems to implement the histogram loss as a type of layer alongside content and style loss layers.

I'm not sure what's causing the gap in middle with your code. I haven't come across the issue myself before, so I have no idea what could be going wrong in your code.

ProGamerGov avatar Dec 04 '19 17:12 ProGamerGov

Also, on a bit of an unrelated note, have you tried to get gradient normalization from the original Lua/Torch7 code working in PyTorch? I did figure out that it's more like gradient scaling: https://github.com/ProGamerGov/neural-style-pt/issues/26#issuecomment-541355470, but I'm beginning to think that it's not possible in PyTorch without a ton of hacky workarounds.

ProGamerGov avatar Dec 04 '19 17:12 ProGamerGov

The histogram approach is getting interesting aesthetic results, and seems to work well in combination with normal Gram losses. Pierre also uses Adam to optimize it instead of L-BFGS, which didn't work well in the original neural-style, but maybe could if the hyper-parameters are fine-tuned just right.

Yeah, I'm stumped on the gray region. I don't think there's a bug in the code... I think maybe it's just the expected behavior when you try to spatially mix gradients. I'm still researching alternatives.

I have not tried implementing normalized gradients. My recollection from the original neural-style was that it did not produce dramatic differences, but maybe I am not aware of cases where it might be useful?

genekogan avatar Dec 05 '19 17:12 genekogan

@genekogan I do recall seeing some issues with gradients when using masks that had very small/thin regions surrounded by other regions. Maybe something like that could be being exaggerated by your code?

Gradient normalization in neural-style worked extremely well with higher content and style weight values. (ex: https://github.com/jcjohnson/neural-style/issues/240#issuecomment-225420069, though I'd suggest values closer to cw 50-500, sw 4000-8000) I've also seen users on Reddit talking about how it made heavily stylized faces look better.

ProGamerGov avatar Dec 05 '19 19:12 ProGamerGov

Torch7 had really bad default parameter values for the Adam optimizer, which is why neural-style had a parameter for the learning rate. PyTorch's Adam optimizer seems use better default parameter values, though I haven't played around with different values for it (though the values are really similar, if not the same as the ones I used in modified neural-style versions).

Do you think the histogram results work better as their own separate layers, or as part of the style layers (like in your code, I think)?

ProGamerGov avatar Dec 05 '19 20:12 ProGamerGov

Yes, I have it in the same layers, which is how Pierre did it. I don't know any reason why it might do better in different layers. I do need to find better values for the strength coefficients, as the histogram loss at the values it has now overwhelms the other loss terms. Pierre wrote in his paper that the best results come from using both histogram and Gram loss together.

genekogan avatar Dec 05 '19 21:12 genekogan

I implemented histogram loss as it's own layer type alongside content and style layers. The code can be found here: https://gist.github.com/ProGamerGov/30e95ac9ff42f3e09288ae07dc012a76

Histogram loss example output on the left, control test (no histogram loss) on the right:

There are more examples in the comments of the gist.

ProGamerGov avatar Dec 08 '19 22:12 ProGamerGov

Super nice, I commented further in the gist.

genekogan avatar Dec 08 '19 23:12 genekogan

Another note about transitional blending problem. I e-mailed Leon Gatys about it and he suggested that since covariance loss seems to reduce the smudging effect more than Gram, to try using covariance loss on the lower layers (where the differences between the styles are greatest) and using Gram on the higher layers to preserve better style reconstruction. Going to try that next.

genekogan avatar Dec 08 '19 23:12 genekogan

@genekogan I replied to your comment in the gist regarding weights.

Someone also already implemented covariance loss in neural-style-pt here: https://github.com/ProGamerGov/neural-style-pt/issues/11, so that should help with covariance loss part of your plan.

ProGamerGov avatar Dec 08 '19 23:12 ProGamerGov

It looks like there may an issue with larger image sizes when testing the histogram layers. I don't know enough about C++ to decode it.

Running optimization with L-BFGS
Traceback (most recent call last):
  File "neural_style_hist_loss.py", line 538, in <module>
    main()
  File "neural_style_hist_loss.py", line 289, in main
    optimizer.step(feval)
  File "/usr/local/lib/python3.5/dist-packages/torch/optim/lbfgs.py", line 307, in step
    orig_loss = closure()
  File "neural_style_hist_loss.py", line 267, in feval
    net(img)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "neural_style_hist_loss.py", line 531, in forward
    target = self.calcHist(input[0], self.target_hist, self.target_min, self.target_max)
  File "neural_style_hist_loss.py", line 518, in calcHist
    cpp.matchHistogram(res, target.clone())
RuntimeError: n cannot be greater than 2^24+1 for Float type. (check_supported_max_int_with_precision at /pytorch/aten/src/ATen/native/TensorFactories.h:78)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7fc56a16b813 in /usr/local/lib/python3.5/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x1bb1638 (0x7fc56c377638 in /usr/local/lib/python3.5/dist-packages/torch/lib/libtorch.so)
frame #2: at::native::randperm_out_cpu(at::Tensor&, long, at::Generator*) + 0x3c (0x7fc56c36fd0c in /usr/local/lib/python3.5/dist-packages/torch/lib/libtorch.so)
frame #3: <unknown function> + 0x1d9e3e4 (0x7fc56c5643e4 in /usr/local/lib/python3.5/dist-packages/torch/lib/libtorch.so)
frame #4: at::native::randperm(long, at::Generator*, c10::TensorOptions const&) + 0xab (0x7fc56c36c5eb in /usr/local/lib/python3.5/dist-packages/torch/lib/libtorch.so)
frame #5: at::native::randperm(long, c10::TensorOptions const&) + 0xe (0x7fc56c36c6ee in /usr/local/lib/python3.5/dist-packages/torch/lib/libtorch.so)
frame #6: <unknown function> + 0x1ecce9b (0x7fc56c692e9b in /usr/local/lib/python3.5/dist-packages/torch/lib/libtorch.so)
frame #7: at::Tensor at::ATenOpTable::callUnboxed<at::Tensor, long, c10::TensorOptions const&>(long, c10::TensorOptions const&) const + 0xb6 (0x7fc565ecd1d4 in /tmp/torch_extensions/histogram_cpp/histogram_cpp.so)
frame #8: <unknown function> + 0x82f69 (0x7fc565ebff69 in /tmp/torch_extensions/histogram_cpp/histogram_cpp.so)
frame #9: torch::randperm(long, c10::TensorOptions const&)::{lambda()#1}::operator()() const + 0x97 (0x7fc565ec8b81 in /tmp/torch_extensions/histogram_cpp/histogram_cpp.so)
frame #10: torch::randperm(long, c10::TensorOptions const&) + 0x192 (0x7fc565ec8d5c in /tmp/torch_extensions/histogram_cpp/histogram_cpp.so)
frame #11: matchHistogram(at::Tensor&, at::Tensor&) + 0x10a (0x7fc565ec0696 in /tmp/torch_extensions/histogram_cpp/histogram_cpp.so)
frame #12: <unknown function> + 0x7e653 (0x7fc565ebb653 in /tmp/torch_extensions/histogram_cpp/histogram_cpp.so)
frame #13: <unknown function> + 0x7b692 (0x7fc565eb8692 in /tmp/torch_extensions/histogram_cpp/histogram_cpp.so)
frame #14: <unknown function> + 0x77343 (0x7fc565eb4343 in /tmp/torch_extensions/histogram_cpp/histogram_cpp.so)
frame #15: <unknown function> + 0x77533 (0x7fc565eb4533 in /tmp/torch_extensions/histogram_cpp/histogram_cpp.so)
frame #16: <unknown function> + 0x6a4a1 (0x7fc565ea74a1 in /tmp/torch_extensions/histogram_cpp/histogram_cpp.so)
<omitting python frames>
frame #21: python3() [0x4ebe37]
frame #25: python3() [0x4ebd23]
frame #27: python3() [0x4fb9ce]
frame #29: python3() [0x574b36]
frame #33: python3() [0x4ebe37]
frame #37: python3() [0x4ebd23]
frame #39: python3() [0x4fb9ce]
frame #41: python3() [0x574b36]
frame #44: python3() [0x5406df]
frame #46: python3() [0x5406df]
frame #48: python3() [0x5406df]
frame #50: python3() [0x540199]
frame #52: python3() [0x60c272]
frame #57: __libc_start_main + 0xf0 (0x7fc5c1e76830 in /lib/x86_64-linux-gnu/libc.so.6)

ProGamerGov avatar Dec 10 '19 18:12 ProGamerGov

It looks like maybe it originates in randomIndices[featureMaps.numel()] = torch::randperm(featureMaps.numel()).to(at::kLong).cuda(); Maybe problem is featureMaps.numel() > max number for a float? Not sure of an easy workaround, but a not easy one would be to downsample the feature maps before they go into the histogram layer. They probably don't need to be so high-res to get an accurate histogram loss. It would probably speed it up too as it seems to be a lot slower than Gram loss. Another idea would be to simply remove histogram loss from the first style layer, and just have it in the later ones where the feature maps are smaller.

genekogan avatar Dec 10 '19 18:12 genekogan

I just updated my histogram_loss branch with your Histogram loss module, and made it support masking in the same way the Style Loss does. I also added a covariance loss option for the normal StyleLoss module. So far with limited tests, histogram loss does not seem to do much to fix the blending problem. I'm going to do some tests combining the style loss parameters and see if I can either improve that issue somehow or at least get the general style transfer to look nicer.

genekogan avatar Dec 11 '19 00:12 genekogan

@genekogan Has covariance loss made any difference with the lower layers?

For the histogram size problem, we could potentially try to recreate the matchHistogram() function in PyTorch, though torch.histc() is not differentiable so we likely can't use that.

It also looks like Pierre downscales tensors before running them through the matchHistogram() function: https://github.com/pierre-wilmot/NeuralTextureSynthesis/blob/master/main.py#L179-L186

            model.setStyle(torch.nn.functional.interpolate(style, scale_factor = 1.0/4))
            result = torch.nn.functional.interpolate(result, scale_factor = 2)
            model.setStyle(torch.nn.functional.interpolate(style, scale_factor = 1.0/2))
            result = torch.nn.functional.interpolate(result, scale_factor = 2)
            model.setStyle(torch.nn.functional.interpolate(style, scale_factor = 1))

ProGamerGov avatar Dec 11 '19 02:12 ProGamerGov

I think in that block, he is actually just doing a 3-part multiscale generation. Capture style at 1/4 scale, generate, upsample it 2x, then capture style at 1/2, generate on top of that, upsample 2x, capture style at 1, generate one last time at 4x original resolution.

genekogan avatar Dec 11 '19 06:12 genekogan

@genekogan I think you're right.

As for the n cannot be greater than 2^24+1 for Float type error, I think it's because of a limitation with float numbers themselves:

A & B are equal to each other according to Python:

a = 2.0e24+5
b = 2.0e24+1

The largest value representable by an n bit integer is (2^n)-1. As noted above, a float has 24 bits of precision in the significand which would seem to imply that 2^24 wouldn't fit.

However.

Powers of 2 within the range of the exponent are exactly representable as 1.0×2n, so 2^24 can fit and consequently the first unrepresentable integer for float is (2^24)+1. As noted above. Again.

Source: https://stackoverflow.com/questions/3793838/which-is-the-first-integer-that-an-ieee-754-float-is-incapable-of-representing-e

Edit: This is exactly what is happening to us.

Resizing the height and width of tensors for the histogram loss layer did not seem to resolve the issue.

ProGamerGov avatar Dec 11 '19 23:12 ProGamerGov

@genekogan I translated my linear-color-transfer.py to PyTorch: https://gist.github.com/ProGamerGov/684c0953395e66db6ac5fe09d6723a5b

The code expects both inputs to have the same size, and it does not change the BGR images to RGB or un-normalize them (though neither of those things seem to influence the output). Hopefully we can use it to create some sort of histogram matching loss function and replace the bugged CUDA code?

ProGamerGov avatar Dec 13 '19 20:12 ProGamerGov

I got linear-color-transfer fully working inside neural-style-pt: https://gist.github.com/ProGamerGov/923b1679b243911e71f9bef4a4bda65a

The histogram class is used to perform the standard histogram matching that's normally done via linear-color-transfer, and it's also used by the histogram loss function.

The histogram loss doesn't work well yet, and quickly becomes nan with -hist_mode pca. -hist_mode pca breaks down on the 70th iteration, and -hist_mode chol seems to work. I'm not sure exactly why this is happening as currently it almost looks like it's working for the first 30 iterations. NumPy is used through PyTorch on a few different lines I think, and these NumPy operations are done on the CPU?

The chol mode can't seem to handle relu1_1. It also seems like the lower histogram matching loss layers are important for reducing the smudged out gray areas.

ProGamerGov avatar Dec 13 '19 23:12 ProGamerGov

Using these histogram parameters:

-hist_mode chol -hist_weight 4000 -hist_layers relu2_1,relu3_1,relu4_1,relu4_2,relu5_1 -hist_image examples/inputs/seated-nude.jpg -hist_target content

Histogram loss & histogram matching preprocessing example output on the left, control test (no histogram loss) & histogram matching preprocessing on the right:

And the histogram loss output without histogram matching preprocessing :

ProGamerGov avatar Dec 14 '19 19:12 ProGamerGov

So, oddly enough this code replicates the results from using the CUDA histogram matching code extremely well:

    def double_mean(self, tensor):
        tensor = tensor.squeeze(0).permute(2, 1, 0)
        return tensor.mean(0).mean(0)

    def forward(self, input):
        if self.mode == 'captureS':     
            self.target = self.double_mean(input.detach())      
        elif self.mode == 'loss':
            input_dmean = self.double_mean(input.detach())
            self.loss = 0.01 * self.strength * self.crit(input_dmean , self.target)
        return input
-hist_weight 40000 -hist_layers relu1_1,relu2_1,relu3_1,relu4_1,relu4_2,relu5_1

With relu1_1 on the left, without relu1_1 on the right:

ProGamerGov avatar Dec 16 '19 20:12 ProGamerGov

So, MSELoss() is implemented as: ((input-target)**2).mean(). When I combine MSELoss() with mean(0).mean(0) for content and style loss, I get what looks like DeepDream hallucinations.

Both images had .permute(2, 1, 0) used on them before I got the means. The image on left uses MSELoss(input.mean(0).mean(0), target.mean(0).mean(0)), while the image on the right adds MSELoss(input.mean(1).mean(0), target.mean(1).mean(0)) as well in addition to the first:

I previously used this code in neural_style_deepdream.py to implement simultaneous style transfer and DeepDream:

-input.mean() * self.strength

So, it looks like my code in my above comment is essentially a DeepDream layer (not a histogram matching loss layer) , and those DeepDream hallucinations provide detail for the style transfer process to attach to on bland regions like the sky in the example input image.

@genekogan I wonder if this could be used as a possible solution to your blending problem?

ProGamerGov avatar Dec 17 '19 22:12 ProGamerGov