Gen6D
Gen6D copied to clipboard
How to improve the results
I've been playing with the Gen6D couple of times with first the mouse and then with custom objects but still can't figure out correctly how to get everything right at the first attempt. So I decided to use a relatively long reference video and the same video as query video but still don't get it right.
In this example, I have a reference video with 6885 frames. Only 689 were used with Colmap to get 3D model. It took 6 hours BTW. Then I estimated the meta info using these instructions and I've got those values:
0.984891 -0.219880 -0.156923
-0.0717203 0.377376 -0.923279
Here is my 3D model. I expected better results with 689 frames:
After that I run the predict on the same reference video and the detection fails:
1- So how to guarantee the detection at the first attempt? Here I use 689 images for reconstruction and 6840 for detection (reference images are a subset of query images) but the detection still fails. Having the same datasets (reference and query) and the detection still fails, does it mean that meta data aren't correct? I don't see any reason why it fails in that case. Other reason why it is strange to me is the fact that the scale difference between the reference and the query images is nearly the same.
But after couple of frames, the tracking works correctly. Here are the frames:
Frame 1/6885
Frame 624/6885
Frame 640/6885
Then it works till the end of the video.
2- When I compute the meta data using CC couple of times for the same model (without closing CC), I get slightly different results, this is normal behavior and does it affect the performance of Gen6D?
- Hi, the detection is indeed not very stable especially when there is a large scale difference. About the scale difference, you may refer to this https://github.com/liuyuan-pal/Gen6D/issues/29#issuecomment-1308379909. We will always resize the reference image so that there is still a large scale difference even using the original reference sequence as a query. The results suddenly get better because we will crop the image according to the current pose, the refiner happens to find a pose near to the object and thus the subsequent steps are correct. You may refer to intermediate results as stated here https://github.com/liuyuan-pal/Gen6D#qualitative-results.
- Slightly changing the metadata will not change the final results. It's a normal behavior of Gen6D.