OcclusionFusion Fitting the RAFT model 16GB model on 2080Ti?

Hey, A question about the optical flow training model, was the RAFT model in a way shrunk down to fit it on the 11GB GPU that you mention in paper?

If so, will the code for training the RAFT optical flow model also be released?

Apr 05 '22 18:04 dhruvmetha

Hi, thanks for your interest!

We only made a few changes to the RAFT open source implementation to adapt the RGB-D input, and no additional modifications were made for the GPU memory.

Apr 07 '22 06:04 wenbin-lin

Thanks for the response!

Will the code for this adaptation for RGB-D input be released? If not, could you give a high-level overview of how I could go about it, I'm trying to replicate the paper for RGB-D inputs using the raw RAFT code. Is it adding the inverse of the depth channel as the extra channel to the RGB image and the rest remains the same?

This information would be of great help!

Apr 07 '22 19:04 dhruvmetha

We do not have plan to release the code for RGB-D based RAFT training for now, it's actually quite simple to implement. As you mentioned, we just add the inverse of the depth as an extra channel and keep the rest remains the same.

Apr 10 '22 08:04 wenbin-lin

Thank you! It is mentioned you retrain on 3 datasets, Sintel, FlyingThings3D, and Monkaa. Do you do them in order and successively train for 100k, 100k and 100k iterations? Sorry to be asking so many questions!

Apr 11 '22 17:04 dhruvmetha

We train the model successively in the order of FlyingThings3D -> Monkaa -> Sintel for 100k iterations each. If there is any confusion about it, please feel free to let me know.

Apr 12 '22 14:04 wenbin-lin

Do y'all freeze the backbone post training on FlyingThings3D or just freeze the batchnorm inside the backbone model as done so in the original RAFT paper? Also do you use the smaller FlyingThings3D dataset (the subset used for dispnet/Flownet2.0) ? Thanks in advance, appreciate the help!

Apr 12 '22 17:04 dhruvmetha

We follow the RAFT implementation and just freeze the batchnorm after training on FlyingThings3D. And we use the full the FlyingThings3D dataset instead of the smaller subset.

Apr 14 '22 06:04 wenbin-lin

Thanks, this has been really helpful @wenbin-lin !

Apr 14 '22 22:04 dhruvmetha

Is the equation from disparity to depth depth = (focal_length * baseline) / (image_width * disparity), which is equivalent to (1050 * 1.0) / (960 * disparity) for FlyingThings3D? and depth inverse is just 1 - depth where values in depth range from 0 to 1?

Apr 15 '22 22:04 dhruvmetha

Your equation is right. The inverse depth is 1 / depth, as there can be large depth values in the background and using the inverse depth stabilizes the depth values. In addition, we use a min max scaler for the inverse depth: x = (x - x_min) / (x_max - x_min). For convenience, you can just use the disparity as the inverse depth, because the value is the same after the min max scaler.

Apr 19 '22 03:04 wenbin-lin

Thank you @wenbin-lin

Apr 20 '22 18:04 dhruvmetha

Do y'all have any rough evaluation results for optical training through each phase of training? This would really help me know if I'm training the model correctly!

Apr 21 '22 19:04 dhruvmetha

We are sorry that we lost the training log, but we are retraining the RGB-D based optical flow model. When the training is done, we will share the evaluation results with you.

A rough conclusion is that the evaluation errors of RGB-D based method can be significantly lower than the RGB based method. Perhaps you can compare your results with the results of the original RGB-based RAFT and the error should be much lower.

Apr 26 '22 04:04 wenbin-lin

Is the equation from disparity to depth depth = (focal_length * baseline) / (image_width * disparity), which is equivalent to (1050 * 1.0) / (960 * disparity) for FlyingThings3D? and depth inverse is just 1 - depth where values in depth range from 0 to 1?

Your equation is right. The inverse depth is 1 / depth, as there can be large depth values in the background and using the inverse depth stabilizes the depth values. In addition, we use a min max scaler for the inverse depth: x = (x - x_min) / (x_max - x_min). For convenience, you can just use the disparity as the inverse depth, because the value is the same after the min max scaler.

Wait it should be focallength*baseline / disparity right? Why do we need to multiply image_width by disparity there? Also, is the scaler applied to each depth? Or the x_min and x_max is the value from the whole dataset?

Jul 09 '22 04:07 phamtrongthang123

@wenbin-lin Hi, is there any update on retraining the rgbd optical flow? I'm working on a research project and eager to try your method out!

Apr 19 '23 00:04 Guptajakala

OcclusionFusion OcclusionFusion copied to clipboard

Fitting the RAFT model 16GB model on 2080Ti?

OcclusionFusion
OcclusionFusion copied to clipboard