sonnet icon indicating copy to clipboard operation
sonnet copied to clipboard

The test TrainableStateTest.testForCore fails consistently

Open loopylangur opened this issue 5 years ago • 3 comments

Hi,

The test TrainableStateTest.testForCore in sonnet/src/recurrent_test.py fails consistently when I run it with either tensorflow 1.5 or 2.0

Is this expected or is there a dependency issue? I am on the v2 branch. Please find details below:

Error log:

=============================================================================================== test session starts ===============================================================================================
platform linux -- Python 3.7.6, pytest-5.3.2, py-1.8.1, pluggy-0.13.1
rootdir: sonnet
collected 1 item

sonnet/src/recurrent_test.py F                                                                                                                                                                              [100%]

==================================================================================================== FAILURES =====================================================================================================
_________________________________________________________________________________________ TrainableStateTest.testForCore __________________________________________________________________________________________

self = <recurrent_test.TrainableStateTest testMethod=testForCore>

    def testForCore(self):
      core = recurrent.LSTM(hidden_size=16)
      trainable_state = recurrent.TrainableState.for_core(core)
      self.assertAllClose(
>         trainable_state(batch_size=42), core.initial_state(batch_size=42))

sonnet/src/recurrent_test.py:667: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
python3.7/site-packages/tensorflow_core/python/framework/test_util.py:1153: in decorated
    return f(*args, **kwds)
python3.7/site-packages/tensorflow_core/python/framework/test_util.py:2495: in assertAllClose
    self._assertAllCloseRecursive(a, b, rtol=rtol, atol=atol, msg=msg)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <recurrent_test.TrainableStateTest testMethod=testForCore>
a = _TupleWrapper([<tf.Tensor: shape=(42, 16), dtype=float32, numpy=
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0... 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]],
      dtype=float32)>])
b = OrderedDict([('hidden', <tf.Tensor: shape=(42, 16), dtype=float32, numpy=
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., ...0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]],
      dtype=float32)>)])
rtol = 1e-06, atol = 1e-06, path = [], msg = ''

    def _assertAllCloseRecursive(self,
                                 a,
                                 b,
                                 rtol=1e-6,
                                 atol=1e-6,
                                 path=None,
                                 msg=None):
      path = path or []
      path_str = (("[" + "][".join([str(p) for p in path]) + "]") if path else "")
      msg = msg if msg else ""
    
      # Check if a and/or b are namedtuples.
      if hasattr(a, "_asdict"):
        a = a._asdict()
      if hasattr(b, "_asdict"):
        b = b._asdict()
      a_is_dict = isinstance(a, collections_abc.Mapping)
      if a_is_dict != isinstance(b, collections_abc.Mapping):
        raise ValueError("Can't compare dict to non-dict, a%s vs b%s. %s" %
>                        (path_str, path_str, msg))
E       ValueError: Can't compare dict to non-dict, a vs b.

python3.7/site-packages/tensorflow_core/python/framework/test_util.py:2418: ValueError
---------------------------------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------------------------------
2020-01-26 22:31:35.915223: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-01-26 22:31:36.400027: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:03:00.0 name: Quadro K620 computeCapability: 5.0
coreClock: 1.124GHz coreCount: 3 deviceMemorySize: 1.95GiB deviceMemoryBandwidth: 26.82GiB/s
2020-01-26 22:31:36.400117: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2020-01-26 22:31:36.400176: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10'; dlerror: libcublas.so.10: cannot open shared object file: No such file or directory
2020-01-26 22:31:36.400229: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory
2020-01-26 22:31:36.400281: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory
2020-01-26 22:31:36.400333: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory
2020-01-26 22:31:36.400385: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10'; dlerror: libcusparse.so.10: cannot open shared object file: No such file or directory
2020-01-26 22:31:36.403448: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-01-26 22:31:36.403468: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1592] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-01-26 22:31:36.403721: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-01-26 22:31:36.410367: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2993350000 Hz
2020-01-26 22:31:36.411301: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x556c3fef6550 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-01-26 22:31:36.411321: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-01-26 22:31:36.487384: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x556c3ff5c500 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-01-26 22:31:36.487419: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Quadro K620, Compute Capability 5.0
2020-01-26 22:31:36.487550: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-01-26 22:31:36.487560: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      
================================================================================================ warnings summary =================================================================================================
python3.7/site-packages/tensorflow_core/python/pywrap_tensorflow_internal.py:15
  python3.7/site-packages/tensorflow_core/python/pywrap_tensorflow_internal.py:15: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    import imp

sonnet/src/recurrent_test.py::TrainableStateTest::testForCore
sonnet/src/recurrent_test.py::TrainableStateTest::testForCore
sonnet/src/recurrent_test.py::TrainableStateTest::testForCore
  python3.7/site-packages/tree/__init__.py:258: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working
    return _tree.flatten(structure)

-- Docs: https://docs.pytest.org/en/latest/warnings.html
========================================================================================== 1 failed, 4 warnings in 2.96s ==========================================================================================

Environment:

Ubuntu 16.04
python 3.7
numpy==1.17.4                                                                                                                                                                                                      
tensorboard==2.0.2                                                                                                                                                                                                 
tensorflow==2.0.0                                                                                                                                                                                                  
tensorflow-datasets==1.2.0                                                                                                                                                                                         
tensorflow-estimator==2.0.1                                                                                                                                                                                        
tensorflow-gpu==2.1.0                                                                                                                                                                                              
tensorflow-metadata==0.14.0                                                                                                                                                                                        
tensorflow-probability==0.8.0rc0  

loopylangur avatar Jan 27 '20 04:01 loopylangur

Thanks for your report. @superbobry is looking into this.

malcolmreynolds avatar Jan 27 '20 13:01 malcolmreynolds

Hi @loopylangur, the failure you're seeing is a result of switching Sonnet 2 to deepmind/tree which did not (until recently, see deepmind/tree@66ace75ebec22c1bacb4a57446b0b5a4db254104) work well with wrapt.ObjectProxy objects used by tf.AutoTrackable.

I will do a bugfix release of tree in the coming days which should fix the issue.

superbobry avatar Jan 28 '20 20:01 superbobry

@loopylangur could you upgrade tree and let us know if this helps?

superbobry avatar Jan 29 '20 16:01 superbobry