DeepForest icon indicating copy to clipboard operation
DeepForest copied to clipboard

Don't run on_validation_epoch_end() during sanity checking if val-accuracy-interval is greater than max epochs.

Open bw4sz opened this issue 1 year ago • 5 comments

In https://deepforest.readthedocs.io/en/v1.4.1/user_guide/09_configuration_file.html#val-accuracy-interval, we tell users that to turn off the validation callback, set the interval to greater than the number of epochs.

https://github.com/weecology/DeepForest/blob/e5f3301f6a1c931bd41fad7ab06d17b3b191df3f/src/deepforest/main.py#L718

But that's not quite true, because during sanity checking the self.current_epoch = 0, this means it still checks the line above, there is no reason and is probably dangerous, to check the validation dataloader if we don't intend on ever calling it during training.

More on sanity checking

https://lightning.ai/forums/t/validation-sanity-check/174

https://pytorch-lightning.readthedocs.io/en/0.9.0/api/pytorch_lightning.trainer.evaluation_loop.html

  1. We need to test and document and check the current behavior.
  2. Update docs with what is actually happening.

This issue should wait on the refactor of #858

bw4sz avatar Dec 17 '24 23:12 bw4sz

Hi @bw4sz, I have an approach to this issue. Can you let me know if I've got it right?

Approach:

  1. Modify on_validation_epoch_end()

    • Add a condition to skip validation when val_accuracy_interval > max_epochs.
    • This ensures that validation is disabled during both sanity check and training when it's not needed.
  2. Create tests to verify the behavior:

    • test_validation_disabled(): Tests when validation should be disabled
    • test_validation_enabled(): Tests when validation should be enabled
    • test_sanity_check_validation(): Specifically tests sanity check behavior

Does this approach address the issue correctly?

Bhavya1604 avatar Mar 27 '25 07:03 Bhavya1604

Hi @reddykkeerthi (ref: #1025) and @Abhishek-Dimri, Thank you both for your valuable contributions to the project! Since you’ve each brought useful ideas to this issue, I’d like to ask you to collaborate and submit a single, combined PR that reflects both of your approaches. Please take some time to review each other’s designs and work together toward a unified solution. Also, please do not close PRs #1025 or #1036. You can update their titles to include [Duplicate] or something similar, but keep them open for reference. Looking forward to your joint effort!

henrykironde avatar Apr 20 '25 22:04 henrykironde

Hi @henrykironde, I think you might have meant to tag @reddykkeerthi instead of rhydham-nith.

Abhishek-Dimri avatar Apr 21 '25 02:04 Abhishek-Dimri

@Abhishek-Dimri thanks

henrykironde avatar Apr 21 '25 02:04 henrykironde

Hi @reddykkeerthi👋

I used your validation condition logic as a base and added a tweak to skip validation during PyTorch Lightning’s sanity check phase, which was still triggering on_validation_epoch_end() unexpectedly.

Key Changes:

  1. Added a guard for self.trainer.sanity_checking (backed by Lightning Docs)
  2. Expanded test coverage:
  • test_validation_skip_during_sanity_check
  • test_validation_runs_in_standalone_mode
  • Enhanced test_validation_skipped_when_interval_exceeds_epochs

Your original structure was super helpful. Let me know if you'd like to collaborate further on the combined PR! Happy to iterate together.

Abhishek-Dimri avatar Apr 21 '25 04:04 Abhishek-Dimri