OcclusionFusion icon indicating copy to clipboard operation
OcclusionFusion copied to clipboard

Fitting the RAFT model 16GB model on 2080Ti?

Open dhruvmetha opened this issue 2 years ago • 15 comments

Hey, A question about the optical flow training model, was the RAFT model in a way shrunk down to fit it on the 11GB GPU that you mention in paper?

If so, will the code for training the RAFT optical flow model also be released?

dhruvmetha avatar Apr 05 '22 18:04 dhruvmetha

Hi, thanks for your interest!

We only made a few changes to the RAFT open source implementation to adapt the RGB-D input, and no additional modifications were made for the GPU memory.

wenbin-lin avatar Apr 07 '22 06:04 wenbin-lin

Thanks for the response!

Will the code for this adaptation for RGB-D input be released? If not, could you give a high-level overview of how I could go about it, I'm trying to replicate the paper for RGB-D inputs using the raw RAFT code. Is it adding the inverse of the depth channel as the extra channel to the RGB image and the rest remains the same?

This information would be of great help!

dhruvmetha avatar Apr 07 '22 19:04 dhruvmetha

We do not have plan to release the code for RGB-D based RAFT training for now, it's actually quite simple to implement. As you mentioned, we just add the inverse of the depth as an extra channel and keep the rest remains the same.

wenbin-lin avatar Apr 10 '22 08:04 wenbin-lin

Thank you! It is mentioned you retrain on 3 datasets, Sintel, FlyingThings3D, and Monkaa. Do you do them in order and successively train for 100k, 100k and 100k iterations? Sorry to be asking so many questions!

dhruvmetha avatar Apr 11 '22 17:04 dhruvmetha

We train the model successively in the order of FlyingThings3D -> Monkaa -> Sintel for 100k iterations each. If there is any confusion about it, please feel free to let me know.

wenbin-lin avatar Apr 12 '22 14:04 wenbin-lin

Do y'all freeze the backbone post training on FlyingThings3D or just freeze the batchnorm inside the backbone model as done so in the original RAFT paper? Also do you use the smaller FlyingThings3D dataset (the subset used for dispnet/Flownet2.0) ? Thanks in advance, appreciate the help!

dhruvmetha avatar Apr 12 '22 17:04 dhruvmetha

We follow the RAFT implementation and just freeze the batchnorm after training on FlyingThings3D. And we use the full the FlyingThings3D dataset instead of the smaller subset.

wenbin-lin avatar Apr 14 '22 06:04 wenbin-lin

Thanks, this has been really helpful @wenbin-lin !

dhruvmetha avatar Apr 14 '22 22:04 dhruvmetha

Is the equation from disparity to depth depth = (focal_length * baseline) / (image_width * disparity), which is equivalent to (1050 * 1.0) / (960 * disparity) for FlyingThings3D? and depth inverse is just 1 - depth where values in depth range from 0 to 1?

dhruvmetha avatar Apr 15 '22 22:04 dhruvmetha

Your equation is right. The inverse depth is 1 / depth, as there can be large depth values in the background and using the inverse depth stabilizes the depth values. In addition, we use a min max scaler for the inverse depth: x = (x - x_min) / (x_max - x_min). For convenience, you can just use the disparity as the inverse depth, because the value is the same after the min max scaler.

wenbin-lin avatar Apr 19 '22 03:04 wenbin-lin

Thank you @wenbin-lin

dhruvmetha avatar Apr 20 '22 18:04 dhruvmetha

Do y'all have any rough evaluation results for optical training through each phase of training? This would really help me know if I'm training the model correctly!

dhruvmetha avatar Apr 21 '22 19:04 dhruvmetha

We are sorry that we lost the training log, but we are retraining the RGB-D based optical flow model. When the training is done, we will share the evaluation results with you.

A rough conclusion is that the evaluation errors of RGB-D based method can be significantly lower than the RGB based method. Perhaps you can compare your results with the results of the original RGB-based RAFT and the error should be much lower.

wenbin-lin avatar Apr 26 '22 04:04 wenbin-lin

Is the equation from disparity to depth depth = (focal_length * baseline) / (image_width * disparity), which is equivalent to (1050 * 1.0) / (960 * disparity) for FlyingThings3D? and depth inverse is just 1 - depth where values in depth range from 0 to 1?

Your equation is right. The inverse depth is 1 / depth, as there can be large depth values in the background and using the inverse depth stabilizes the depth values. In addition, we use a min max scaler for the inverse depth: x = (x - x_min) / (x_max - x_min). For convenience, you can just use the disparity as the inverse depth, because the value is the same after the min max scaler.

Wait it should be focallength*baseline / disparity right? Why do we need to multiply image_width by disparity there? Also, is the scaler applied to each depth? Or the x_min and x_max is the value from the whole dataset?

phamtrongthang123 avatar Jul 09 '22 04:07 phamtrongthang123

@wenbin-lin Hi, is there any update on retraining the rgbd optical flow? I'm working on a research project and eager to try your method out!

Guptajakala avatar Apr 19 '23 00:04 Guptajakala