accelerated_features icon indicating copy to clipboard operation
accelerated_features copied to clipboard

Megadepth dataset and training convergence

Open Odin-byte opened this issue 8 months ago • 4 comments

First of all thank you for providing such a well documented repository including your training code.

I have not used your download tool to download the megadepth_v1 dataset, but downloaded it myself beforehand.

What I stumbled upon on while looking over your code is the fact, that your provided download tool gives a warning, that the megadepth dataset uses around 500 GB of space. But the extracted ZIP downloaded from the cornell servers is only around ~200 GB in size.

While the training script does run (it does while I type this) it seams that the network is converging, but rather slow and the combined loss after 60% training progress is still rather high (~5.0).

Could the way smaller megadepth dataset size be part of the problem here?

The structure of my training data seems to match the requirements.

I would be glad for some words of guidance. @guipotje @felipecadar @renatojmsdh @ericksonrn

Cheers!

Odin-byte avatar May 06 '25 14:05 Odin-byte

Sorry, I misunderstood the size warning for the megadepth dataset. It seems to be the correct dataset. But still the issue of a high loss of 4.7 remains after all 160.000 steps.

Could the missing prepossessed D2-Net megadepth images be part of the problem? These preprocessed images, provided by the LoFTR repo are not available anymore.

Thanks for your help!

Odin-byte avatar May 07 '25 09:05 Odin-byte

I trained the network on the coco dataset only, which did not result in an loss improvement.

Odin-byte avatar May 08 '25 10:05 Odin-byte

@Odin-byte Hi, did you find potential reason, I tried some data and hyper-patameter combination but the loss is still more than 4.5 after 160000 steps.

dinglei0719 avatar Aug 09 '25 06:08 dinglei0719

@dinglei0719 Hey, I guess the loss definition might be a causing a relative high total loss. But besides the loss feeling pretty high the network does perform as expected. I tried running the provided megadepth evaluation script using my trained weights (which I retrieved after the full training and a final loss of around 5) and got the same values as presented in the paper.

To sum it up, the high total loss is caused by the definition of the losses, but the evaluation performance is as good as expected.

Odin-byte avatar Aug 11 '25 15:08 Odin-byte