pytorch-dense-correspondence icon indicating copy to clipboard operation
pytorch-dense-correspondence copied to clipboard

ZeroDivisionError: float division by zero in loss function

Open ZhangZhaofeng opened this issue 6 years ago • 5 comments

Run script in

dense_correspondence/experiments/shoes/training_shoes.ipynb

meet a ZeroDivisionError

Dataset is

shoe_train_4_shoes.yaml

It seems that in some case, the num_matches == 0 cause the error. How to fix it?

Error msg is as follows:

ZeroDivisionError Traceback (most recent call last) in () 5 print "training descriptor of dimension %d" %(d) 6 train = DenseCorrespondenceTraining(dataset=dataset, config=train_config) ----> 7 train.run() 8 print "finished training descriptor of dimension %d" %(d)

/home/zhang/code/dense_correspondence/training/training.pyc in run(self, loss_current_iteration, use_pretrained) 340 masked_non_matches_a, masked_non_matches_b, 341 background_non_matches_a, background_non_matches_b, --> 342 blind_non_matches_a, blind_non_matches_b) 343 344

/home/zhang/code/dense_correspondence/loss_functions/loss_composer.pyc in get_loss(pixelwise_contrastive_loss, match_type, image_a_pred, image_b_pred, matches_a, matches_b, masked_non_matches_a, masked_non_matches_b, background_non_matches_a, background_non_matches_b, blind_non_matches_a, blind_non_matches_b) 31 masked_non_matches_a, masked_non_matches_b, 32 background_non_matches_a, background_non_matches_b, ---> 33 blind_non_matches_a, blind_non_matches_b) 34 35 if (match_type == SpartanDatasetDataType.SINGLE_OBJECT_ACROSS_SCENE).all():

/home/zhang/code/dense_correspondence/loss_functions/loss_composer.pyc in get_within_scene_loss(pixelwise_contrastive_loss, image_a_pred, image_b_pred, matches_a, matches_b, masked_non_matches_a, masked_non_matches_b, background_non_matches_a, background_non_matches_b, blind_non_matches_a, blind_non_matches_b) 82 matches_a, matches_b, 83 masked_non_matches_a, masked_non_matches_b, ---> 84 M_descriptor=pcl._config["M_masked"]) 85 86 if pcl._config["use_l2_pixel_loss_on_background_non_matches"]:

/home/zhang/code/dense_correspondence/loss_functions/pixelwise_contrastive_loss.py in get_loss_matched_and_non_matched_with_l2(self, image_a_pred, image_b_pred, matches_a, matches_b, non_matches_a, non_matches_b, M_descriptor, M_pixel, non_match_loss_weight, use_l2_pixel_loss) 84 85 ---> 86 match_loss, _, _ = PCL.match_loss(image_a_pred, image_b_pred, matches_a, matches_b) 87 88

/home/zhang/code/dense_correspondence/loss_functions/pixelwise_contrastive_loss.py in match_loss(image_a_pred, image_b_pred, matches_a, matches_b) 169 #print(match_loss) 170 #else: --> 171 match_loss = 1.0 / num_matches * (matches_a_descriptors - matches_b_descriptors).pow(2).sum() 172 173 return match_loss, matches_a_descriptors, matches_b_descriptors

ZeroDivisionError: float division by zero

ZhangZhaofeng avatar Aug 02 '19 07:08 ZhangZhaofeng

I think this occurs when the dataloader returns an empty set of matches (which happens very rarely and is probably why we didn't catch it earlier). Should be a simple fix to add a check on whether the size is zero and skip it if so. Will PR a fix as soon as we have time.

manuelli avatar Oct 24 '19 13:10 manuelli

See the branch I linked at end of this issue: https://github.com/RobotLocomotion/pytorch-dense-correspondence/issues/204

I think should fix?

On Thu, Oct 24, 2019 at 6:36 AM Lucas Manuelli [email protected] wrote:

I think this occurs when the dataloader returns an empty set of matches (which happens very rarely and is probably why we didn't catch it earlier). Should be a simple fix to add a check on whether the size is zero and skip it if so. Will PR a fix as soon as we have time.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/RobotLocomotion/pytorch-dense-correspondence/issues/203?email_source=notifications&email_token=ABLBBKF47OFFSU52NASNF5DQQGQF7A5CNFSM4II3EPO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECFBS3Q#issuecomment-545921390, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLBBKEM7LYZQEAZLKZ4QHDQQGQF7ANCNFSM4II3EPOQ .

peteflorence avatar Oct 24 '19 13:10 peteflorence

Yes that fixes it. So that's a good short term solution. @peteflorence if you PR I am happy to review and merge.

manuelli avatar Oct 24 '19 13:10 manuelli

After applying @peteflorence's fix dense_correspondence/experiments/shoes/training_shoes.ipynb script now runs fine. However, I still get exactly the same "ZeroDivisionError: float division by zero" error as per the OP when running the mug training script: dense_correspondence/experiments/mugs/training_mugs.ipynb

bhazza avatar Nov 18 '19 12:11 bhazza

I also got the same error when training on mugs_all.yaml. Here is the entire message. It looks like one of the mask is empty. warning, empty mask b Traceback (most recent call last): File "training_tutorial.py", line 64, in <module> train.run() File "/home/yixuan/pytorch-dense-correspondence/dense_correspondence/training/training.py", line 347, in run gebm_a, gebm_b) File "/home/yixuan/pytorch-dense-correspondence/dense_correspondence/loss_functions/loss_composer.py", line 37, in get_loss graph_embedding_a, graph_embedding_b) File "/home/yixuan/pytorch-dense-correspondence/dense_correspondence/loss_functions/loss_composer.py", line 95, in get_within_scene_loss M_descriptor=0.5) File "/home/yixuan/pytorch-dense-correspondence/dense_correspondence/loss_functions/pixelwise_contrastive_loss.py", line 85, in get_loss_matched_and_non_matched_with_l2 match_loss, _, _ = PCL.match_loss(image_a_pred, image_b_pred, matches_a, matches_b) File "/home/yixuan/pytorch-dense-correspondence/dense_correspondence/loss_functions/pixelwise_contrastive_loss.py", line 165, in match_loss match_loss = 1.0 / num_matches * (matches_a_descriptors - matches_b_descriptors).pow(2).sum() ZeroDivisionError: float division by zero

lix4 avatar Oct 12 '20 19:10 lix4