vnlnet icon indicating copy to clipboard operation
vnlnet copied to clipboard

How can l get the patch when l denoise the sequence with code “test.py”.

Open shentu95 opened this issue 5 years ago • 3 comments

First,suppose l am denoising a sequence which number of frame is 150,and Non-local search parameters:the patch size is 41 , the number of frames on which we search patches is 3.

If l understand right ,the variate "img_noised_patch_stack" here https://github.com/axeldavy/vnlnet/blob/master/test.py#L183 means 3 frames of the sequence which will input the network.The first and the third frame consist of the most similar patch which we search on the corresponding frames and the second frame needs to be denoised.

So how can l get the similar patches from the first and third frame and the corresponding frame from the second frame.

In the code "train.py",it is a patch of the image that goes into the network,but it is a whole image in the code "test.py".So l think the similar patches in the code "test.py" just like this.

1567132489(1)

Patch A,B and C is three corresponding frames.Because of padding,So the size of first patch in frame-1 is only 20*20.So can l just get the patch just like this ?

shentu95 avatar Aug 30 '19 05:08 shentu95

Hi,

you understood right the content of img_noised_patch_stack. In the case you describe, img_noised_patch_stack will be a stack of 3 frames, the 2nd frame being exactly the frame to denoise, and the first and third being the center pixels of the similar patches on the previous and next frame respectively.

There is something that could cause confusion however. There are two times the term 'patches' is used.

  1. We search the nearest neighbor patches along the sequence, and create a stack with the center pixels
  2. During training, similar to how other networks such as DnCNN are trained, we pass batches of patches of images to denoised selected randomly. This is a known trick to speed up and improve training.

Thus 1) is used to create img_noised_patch_stack. However 2) refers to a training trick. In test.py, while we denoise one entire image, in train.py, because of 2), it is not the case. This explains the difference you see between test.py and train.py

I do not understand your last question. Speaking of 1), the processing of the patches goes into three steps: . The OpenCL call returns a buffer with the positions of the top left corner of the patches https://github.com/axeldavy/vnlnet/blob/master/video_patch_search.py#L137 . The positions are converted to positions of the center pixel of the patches https://github.com/axeldavy/vnlnet/blob/master/video_patch_search.py#L148 Note that for VNLnet, patch_data_width=1. . build_neighbors_array builds the stack of pixels at the computed positions.

If you are interested in just the center pixels, you can use img_noised_patch_stack. However if you are interested in the patch positions, you can modify slightly video_patch_search.py.

axeldavy avatar Aug 30 '19 07:08 axeldavy

Sorry,l think you maybe make a mistake about the content of img_noised_patch_stack. the first and third should not be the center pixels of the similar patches,they should be the similar patches.

You can see here https://github.com/axeldavy/vnlnet/blob/master/test.py#L183. The first and third frame have already been convert from the center pixels of the similar patches to similar patches with the function “ps.build_neighbors_array()”.

My last question is very easy.The first and third frame should be consisted of similar patches.How can l extract them from the first and third frame.

In this sketch map(https://user-images.githubusercontent.com/43443967/63991379-1e553680-cb1a-11e9-92a3-9bf498256183.jpg), ”Frame-1” and “Frame+1” means the frame which is composed of similar patches.”Frame” means the frame to denoise.What l want to know is that how the similar patches combine into the image like “Frame-1” so that l can extract them one by one.l made a guess as shown here.So what is the reality?

shentu95 avatar Aug 30 '19 09:08 shentu95

As you can see here https://github.com/axeldavy/vnlnet/blob/master/video_patch_search.py#L154 build_neighbors_array returns an array of size (nn height, nn width, c * num_neighbors * patch_data_width * patch_data_width)

c is the number of channels and num_neighbors the number of neighbors (the number of frames). patch_data_width is the size of the "patch" extracted. However in VideoPatchSearch, I differenciate the size of the patch for the search (patch_search_width) and the one actually extracted (patch_data_width).

Extracting a full patch every pixel is extremely expensive. In addition for VNLnet we didn't find interest in giving more data than just the center pixels. This is why patch_data_width is 1 for VNLnet. Thus the "extracted patch" is just 1 pixel wide. That is it is just the center pixel.

Thus if your goal is to extract the patches found at position (x,y) of a given frame, you should use this table https://github.com/axeldavy/vnlnet/blob/master/test.py#L180

nearest_neighbors_indices[y,x,:] contains the list of indices for position[y,x]

You can see here https://github.com/axeldavy/vnlnet/blob/master/video_patch_search.py#L182

How to convert the indices to (x',y',f') and then extract the patches.

Alternatively, what you can do is use VideoPatchSearch with patch_data_width set to the same value than patch_search_width. build_neighbors_array will then return an array with the full patches every pixel. Note that the size of the array will be quite huge.

axeldavy avatar Aug 30 '19 10:08 axeldavy