sfd2 icon indicating copy to clipboard operation
sfd2 copied to clipboard

Issues faced during running the training and test scripts

Open kkaytekin opened this issue 1 year ago • 17 comments

Hello again, I would like to share some issues that I faced while running the training script. Note that I have prepared the datasets as explained in the R2D2 repository.

  1. Mismatch of dimensions during det_loss calculation: While executing this line I get
RuntimeError: The size of tensor a (64) must match the size of tensor b (65) at non-singleton dimension 1

I solved this by replacing the line 357 as follows:

        elif self.detloss in ['ce']:
            # det_loss = self.det_loss(pred_score=output["semi"], gt_score=output["gt_semi"], weight=output["weight"],
            #                          stability_map=None)
            det_loss = self.det_loss(pred_score=output["semi"], gt_score=output["gt_semi_norm"], weight=output["weight"],
                                     stability_map=None)

I think this error is caused by parsing of wrong values. In inputs, we got

output["gt_semi"].shape = (4,64,64,64) (==gt_score)
output["semi"].shape = (4,65,64,64) (==pred_score)

in output dict we also had

output["gt_semi_norm"] with shape (4,65,64,64)

So i replaced gt_semi with gt_semi_norm which has matching dimensions. I am not sure if this is a valid solution.

  1. Learning rate decay parameters are not specified. In trainer.py, line 166 the interpreter complains that self.args.decay_rate and self.args.decay_iter cannot be found. Indeed, they are neither specified in the argparser nor in the config file. The workaround for now is to disable learning rate decay by replacing line 166 with
#lr = min(self.args.lr * self.args.decay_rate ** (self.iteration - self.args.decay_iter), self.args.lr)
lr = self.args.lr

I think this change will prevent us from replicating the results in the paper.

Also, while running the test script test_aachenv_1_1 there are some matters I would like to mention:

  1. I am not sure whether to use the Aachen dataset that we prepared during the training, or Aachen v1.1 dataset that we can find online (for example, I downloaded it from here, as mentioned in the readme file). Since the datasets might be different, I would like to ask if there are any specific preprocessing steps I should follow to reproduce your results?
  2. In line 31 of the test script, the file pairs-db-covis20.txt is missing. I found it here, but since I found this file and the aachen v1.1 database from different sources, I wanted to ask if there is some other source I should download the aachen v1.1 dataset from, maybe a source including this file already?
  3. We need to specify outputs folder as shown here Does that mean I should first run some other script to do inference and collect the results under some outputs folder I specified?
  4. Missing file aachen_db_imglist.txt here. Google search for this file was not successful.
  5. Missing file day_night_time_queries_with_intrinsics.txt here. Google search for this file was not successful. The aachen v1.1 dataset I mentioned above only has night_time_queries_with_intrinsics.txt. Thank you very much and best regards,

kkaytekin avatar Jul 06 '23 17:07 kkaytekin

same problem...

meng152634 avatar Aug 17 '23 01:08 meng152634

same problem.

1561213 avatar Sep 12 '23 13:09 1561213

And in trainer.py,lin385 eval_out = self.eval_on_data() is likely not defined,so I get

AributeError: 'Trainer' object has no attribute 'eval_on_data'

Thanks.

1561213 avatar Sep 13 '23 02:09 1561213

same problem.

XZYuann avatar Oct 14 '23 10:10 XZYuann

I found that the data about decay_rate in the config_train_r2d2.json in the March 9th version of the code is set to decay_rate=0.99996 decay_iter=80000

pQWQq avatar Oct 23 '23 07:10 pQWQq

+1

zhengshunkai avatar Nov 24 '23 07:11 zhengshunkai

this txt maybe right https://github.com/cvg/Hierarchical-Localization/blob/master/pairs/aachen_v1.1/pairs-db-covis20.txt

zhengshunkai avatar Nov 24 '23 09:11 zhengshunkai

aachen_db_imglist.txt not used; day_night_time_queries_with_intrinsics.txt may be the day+night

zhengshunkai avatar Nov 25 '23 07:11 zhengshunkai

jiu that's why?

Inverse-function avatar Dec 21 '23 13:12 Inverse-function

And in trainer.py,lin385 eval_out = self.eval_on_data() is likely not defined,so I get

AributeError: 'Trainer' object has no attribute 'eval_on_data'

Thanks.

@1561213 same issue with u. Do you have solved this problem?

eronez avatar Apr 23 '24 12:04 eronez

I am pleasantly surprised to receive your reply. There are still many issues that have not been resolved, such as the path settings for colmap and many files used for evaluation on the aachen dataset. Could you please package your project and send it to me. Thanks

Original

From:"eronez"< @.*** >;

Date:2024/4/23 20:34

To:"feixue94/sfd2"< @.*** >;

CC:"Inverse-function"< @.*** >;"Comment"< @.*** >;

Subject:Re: [feixue94/sfd2] Issues faced during running the training and testscripts (Issue #5)

And in trainer.py,lin385 eval_out = self.eval_on_data() is likely not defined,so I get

AributeError: 'Trainer' object has no attribute 'eval_on_data'

Thanks.

@1561213 same issue with u. Do you have solved this problem?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

Inverse-function avatar Apr 24 '24 06:04 Inverse-function

Hi,

Thank you for your interest in our work. I will fix these bugs and update the code.

feixue94 avatar Apr 25 '24 09:04 feixue94

jiu that's why?

your mmseg version is too high,you can chosse it following :https://mmsegmentation.readthedocs.io/zh-cn/0.x/faq.html image

liutao23 avatar May 17 '24 04:05 liutao23

jiu that's why?

your mmseg version is too high,you can chosse it following :https://mmsegmentation.readthedocs.io/zh-cn/0.x/faq.html image

thank you

Inverse-function avatar May 20 '24 03:05 Inverse-function

Hi guys, have you solved the issue of the missing implementation of eval_on_data?

Adolfhill avatar Jun 19 '24 06:06 Adolfhill

And in trainer.py,lin385 eval_out = self.eval_on_data() is likely not defined,so I get

AributeError: 'Trainer' object has no attribute 'eval_on_data'

Thanks.

The same problem, and if I set do_val to 0 to skip this function, will it affect training effect?

zhukaifeng390 avatar Aug 14 '24 08:08 zhukaifeng390