FlowNetPytorch
FlowNetPytorch copied to clipboard
How to generate the groundtruth optical flow by myself in endoscopy image?
You can't, it's very hard. This is actually the point of the Flying Chair dataset : real optical flow requires heavy sensors (like Lidar) and algorithms, so we might as well use synthetic data if it's good enough.
So you can have 2 strategies :
- Recreate synthetic endoscopy videos, with e.g. unreal engine
- Use you videos, and estimate optical with the best optical flow algorithm you can (I would recommend RAFT), with manual assessment after, where you discard bad predictions. It is to be noted that further evaluation will only give accordance with the algorithm you use, which means optimizing this evaluation will only help match the quality of the first algorithm without improving it.
@ClementPinard Thank you for the answer! But I still have three more questions:
- In the paper, Flying Chair only uses the image to generate the optical flow ground truth without Lida sensors, why it can not be to make the endoscopy image the same way. By the way, do you know how to create the Flying Chairs by the code? The follows is what I quote from the paper which related to creating the Flying Chairs synthetic dataset:
The Sintel dataset is still too small to train large CNNs.To provide enough training data, we create a simple synthetic dataset, which we name Flying Chairs, by applying affine transformations to images collected from Flickr and a publicly available set of renderings of 3D chair models [1]. We retrieve 964 images from Flickr2 with a resolution of 1, 024 × 768 from the categories ‘city’ (321), ‘landscape’ (129) and ‘mountain’ (514). We cut the images into 4 quadrants and use the resulting 512 × 384 image crops as background. As foreground objects we add images of multiple chairs from [1] to the background. From the original dataset we remove very similar chairs, resulting in 809 chair types and 62 views per chair available. Examples are shown in Figure 5. To generate motion, we randomly sample 2D affine transformation parameters for the background and the chairs. The chairs’ transformations are relative to the background transformation, which can be interpreted as both the camera and the objects moving. Using the transformation parameters we generate the second image, the ground truth optical flow and occlusion regions. All parameters for each image pair (number, types, sizes and initial positions of the chairs; transformation parameters) are randomly sampled. We adjust the random distributions of these parameters in such a way that the resulting displacement histogram is similar to the one from Sintel (details can be found in the supplementary material). Using this procedure, we generate a dataset with 22,872 image pairs and flow fields (we re-use each background image multiple times). Note that this size >is chosen arbitrarily and could be larger in principle.
- There is a paper to make an optical flow ground truth, do you think that method is feasible to create an endoscopic optical flow ground truth?
- And also there is a Github repository that introduces how to make the optical flow ground truth. But I could not find a way to reproduce the code.
Ah at first I thought you wanted to generate ground truth optical from video, which is infeasible (as also stated by the paper you cite). If you want to generate synthetic data, the flying chair method is perfectly possible with any type of background, and any type of objects. You just need to remember that the data will not be realistic, because no optical flow from 3D movement is seen. So if you have enough data, it might still be possible to have decent results from a real endoscopy video, but you can't be certain before testing.
For your paper, the method is roughly the same as flying chairs, albeit with a structure that fits the structure of test videos more. But in the end is still consists in having still image pieces, and applying know transformation to them, from which you can deduce the optical flow.
@ClementPinard Thanks for your kindly reply! I have a simple question to double-check: if I have a synthetic optical flow matrix, can I use the following function to a frame to get a warped image, and regard the synthetic optical flow matrix as the ground truth of this two image?
def warp_flow(img, flow):
'''
Warp the image by cv2.remap
Args:
img ndarray: the image to warp
flow ndarray: the warp flow array
Returns: the warped image
'''
h, w = flow.shape[:2]
flow = -flow
flow[:,:,0] += np.arange(w)
flow[:,:,1] += np.arange(h)[:,np.newaxis]
res = cv2.remap(img, flow, None, cv2.INTER_LINEAR)
return res
Sorry, did not respond in time.
I would say yes, with two comments:
- This is inverse warp rather than warp. It's subtle bu important as you can only apply it with the optical flow from res to img (and not the other way around). If you want actual forward warp, I found this project that might be interesting for you : https://github.com/lizhihao6/Forward-Warp
- You might want to use pytorch's grid_sample instead of opencv2.