framework-reproducibility icon indicating copy to clipboard operation
framework-reproducibility copied to clipboard

Changing class name/structure changes program functionality (using TensorFlow)

Open edwardyehuang opened this issue 3 years ago • 2 comments

For example:

In my project (https://github.com/edwardyehuang/CAR/blob/master/carnet.py), line 163 (instantiate SegManaged).

If I create a wrapper class that inherited SegManaged and place it on line 163, e.g.

class gbb(SegManaged):

    pass

The performance (e.g. loss) will be different with the original one. However, I found it is depending on the first letter of the "wrapper class", if it starts with a-g (e.g. cbb, cbx, cxxx), the performance will be different, but the performance will be the same if it starts with h-z. Note that, upper/lowercase has no effect

edwardyehuang avatar Apr 27 '22 15:04 edwardyehuang

Hi Edward, I've done a quick triage on this issue:

  • This issue is related to TensorFlow (not PyTorch or another framework).
  • This issue is not about run-to-run reproducibility. The reported issue is that changing program code changes the functionality of the program.
  • The source of non-reproducibility has not been isolated. It could be related to TensorFlow or it could be a bug in Python itself (or somewhere else).
  • There is no minimal/simple reproducer program available. To reproduce, it's necessary to follow the relatively complex and time-consuming installation and configuration instructions here.
  • I don't know when or if I will get around to reproducing the issue and isolating the source. The debug tool I currently have can only find differences between runs of the same program and not differences between runs of two different programs, although I think it would not be too difficult to make that work.
  • If you are able to create a simple and self-contained reproducer program (e.g. a small, single-file program that runs in a colab and uses synthetic data generated in the program), that would help to accelerate a resolution.

One question: I assuming this issue shows up regardless of the accelerator-type you're running on (i.e. both CPU and GPU). Is that correct?

duncanriach avatar Apr 28 '22 22:04 duncanriach

Will provide a minimal code (in colab) in next week

edwardyehuang avatar Apr 29 '22 09:04 edwardyehuang