PIE-Net icon indicating copy to clipboard operation
PIE-Net copied to clipboard

Regarding the method of not reducing to a size 256x256

Open OSHMOS opened this issue 11 months ago • 6 comments
trafficstars

I am deeply grateful for your achievements.

However, I would like to extract albedo and shading using the original image as input. I am curious if I must reduce the size to 256x256 or if I can use the original image as input without separate training. Which part of Network.py should I look at for this?

OSHMOS avatar Nov 26 '24 05:11 OSHMOS

@Morpheus3000

In my opinion, separate training seems necessary.

Would it be possible for you to share the training code and dataset?

OSHMOS avatar Jan 03 '25 13:01 OSHMOS

Hello and thank you for your interest in my work!

You should be able to infer without resizing. Can you detail the problem that you are having?

As for the training code, since this was part of a project at a company, I have to ask for permission for the training code release. In the meantime, I advice to take a look at the details provided in the paper and the supplementary material. Most of the details should be provided there to be able to reconstruct the training code.

Morpheus3000 avatar Jan 16 '25 14:01 Morpheus3000

@Morpheus3000 When I inferred with a custom image as input, I got a dimension mismatch error. So I used torch.nn.functional.interpolate() to match the dimensions for inference. Of course, I didn't touch the parts where the weights are stored, and I only adjusted the parts where the weights are not stored, so I think it worked fine without any problems.

I have one more question: can the deconv_layer at the end increase the size of the output for a custom image? For example, if I input an image of size 378x504, it outputs an image of size 382x512.

OSHMOS avatar Jan 16 '25 18:01 OSHMOS

@Morpheus3000 , As I can see that we have to resize the input image into 256x256 before feeding the image into the network. We cant do simple resizing as In case if we need the dynamic input size and the same size output as input. In that case we need to change a lot of things in the network and than needs to retrain the model for updating the pretrained weights. Please let me know if you have any other solution for it.

Problem Statement: _Given a dynamic size of input image (for say 1200x800) it should be able to predict the same output size without loosing much information during up-sampling. The expected output in this case will be 1200x800.

ansperception avatar Apr 09 '25 06:04 ansperception

@Morpheus3000 When I inferred with a custom image as input, I got a dimension mismatch error. So I used torch.nn.functional.interpolate() to match the dimensions for inference. Of course, I didn't touch the parts where the weights are stored, and I only adjusted the parts where the weights are not stored, so I think it worked fine without any problems.

I have one more question: can the deconv_layer at the end increase the size of the output for a custom image? For example, if I input an image of size 378x504, it outputs an image of size 382x512.

Hello, the combination of the conv and deconv can cause the output size to increase in some cases due to the padding configurations. Since these are a few pixels, the most naive solution can be just to symetrically remove the required pixels from all four sides of the output image. Otherwise, you would need to retrain it, for which you should have access to the network already. The hyper-parameters are already mentioned in the paper. Hope that helps.

Morpheus3000 avatar Apr 09 '25 09:04 Morpheus3000

@Morpheus3000 , As I can see that we have to resize the input image into 256x256 before feeding the image into the network. We cant do simple resizing as In case if we need the dynamic input size and the same size output as input. In that case we need to change a lot of things in the network and than needs to retrain the model for updating the pretrained weights. Please let me know if you have any other solution for it.

Problem Statement: _Given a dynamic size of input image (for say 1200x800) it should be able to predict the same output size without loosing much information during up-sampling. The expected output in this case will be 1200x800.

Hi! I am not sure about the requirement to resize. The network is fully convolutional, so it should be be able to handle, theoretically, any arbitrary sizes, as long as they are larger than a certain base size due to the bottleneck size. But anything larger than 256 should be okay to process since then that doesn't concern that problem. As for the output size, it might be that it's a few pixels more in dimensions. This is, like the previous comment, an issue of the padding config. For my case my requirement was okay to be a square image, so it worked. My recommendation would be to drop/crop out the edge pixels, since they are just padding artefacts. Otherwise, you would need to retrain it, for which you should have access to the network already. The hyper-parameters are already mentioned in the paper. Hope that helps.

Morpheus3000 avatar Apr 09 '25 09:04 Morpheus3000