trax
trax copied to clipboard
[Help/Bug] Issues with rl_trainer
Description
train_rl fails on different issues, depending on tensorflow version. The version of Python (tested 3.7 and 3.8) does not seem to play a role. A Traceback for each version of tensorflow (2.2.0 and 2.3.0) is posted below. Somehow, i could not get the rl_trainer to work, not even like posted in the docs 1. I'd appreciate some help, if it's not a trax related issue. Cheers :)
...
Environment information
OS: Manjaro Linux, Release 20.0.3
$ pip freeze | grep trax
trax==1.3.4
$ pip freeze | grep tensor
mesh-tensorflow==0.1.16
tensor2tensor==1.15.7
tensorboard==2.2.2
tensorboard-plugin-wit==1.7.0
tensorflow==2.2.0
tensorflow-addons==0.10.0
tensorflow-datasets==3.2.1
tensorflow-estimator==2.2.0
tensorflow-gan==2.0.0
tensorflow-hub==0.8.0
tensorflow-metadata==0.22.2
tensorflow-probability==0.7.0
tensorflow-text==2.3.0
And:
mesh-tensorflow==0.1.16
tensor2tensor==1.15.7
tensorboard==2.2.2
tensorboard-plugin-wit==1.7.0
tensorflow==2.3.0rc0
tensorflow-addons==0.10.0
tensorflow-datasets==3.2.1
tensorflow-estimator==2.3.0
tensorflow-gan==2.0.0
tensorflow-hub==0.8.0
tensorflow-metadata==0.22.2
tensorflow-probability==0.11.0
tensorflow-text==2.3.0
$ pip freeze | grep jax
jax==0.1.75
jaxlib==0.1.52
$ python -V
Python 3.7.8
And:
Python 3.8.3
For bugs: reproduction and error logs
# Steps to reproduce:
from trax.rl_trainer import *
train_rl(output_dir='./models/acrobot', train_batch_size=32, eval_batch_size=32, n_epochs=1)
...
# Error logs:
Case 1: tensorflow==2.2.0:
Traceback (most recent call last):
File "test.py", line 1, in <module>
from trax import trainer_flags
File "venv/lib/python3.7/site-packages/trax/__init__.py", line 18, in <module>
from trax import data
File "venv/lib/python3.7/site-packages/trax/data/__init__.py", line 20, in <module>
from trax.data import tf_inputs
File "venv/lib/python3.7/site-packages/trax/data/tf_inputs.py", line 28, in <module>
from t5.data import preprocessors as t5_processors
File "venv/lib/python3.7/site-packages/t5/__init__.py", line 17, in <module>
import t5.data
File "venv/lib/python3.7/site-packages/t5/data/__init__.py", line 17, in <module>
import t5.data.mixtures
File "venv/lib/python3.7/site-packages/t5/data/mixtures.py", line 22, in <module>
import t5.data.tasks # pylint: disable=unused-import
File "venv/lib/python3.7/site-packages/t5/data/tasks.py", line 21, in <module>
from t5.data.utils import Feature
File "venv/lib/python3.7/site-packages/t5/data/utils.py", line 31, in <module>
from t5.data import sentencepiece_vocabulary
File "venv/lib/python3.7/site-packages/t5/data/sentencepiece_vocabulary.py", line 25, in <module>
import tensorflow_text as tf_text
File "venv/lib/python3.7/site-packages/tensorflow_text/__init__.py", line 21, in <module>
from tensorflow_text.python import metrics
File "venv/lib/python3.7/site-packages/tensorflow_text/python/metrics/__init__.py", line 20, in <module>
from tensorflow_text.python.metrics.text_similarity_metric_ops import *
File "venv/lib/python3.7/site-packages/tensorflow_text/python/metrics/text_similarity_metric_ops.py", line 28, in <module>
gen_text_similarity_metric_ops = load_library.load_op_library(resource_loader.get_path_to_datafile('_text_similarity_metric_ops.so'))
File "venv/lib/python3.7/site-packages/tensorflow/python/framework/load_library.py", line 58, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: venv/lib/python3.7/site-packages/tensorflow_text/python/metrics/_text_similarity_metric_ops.so: undefined symbol: _ZN10tensorflow6StatusC1ENS_5error4CodeEN4absl14lts_2020_02_2511string_viewE
Process finished with exit code 1
Case 2: tensorflow==2.3.0:
2020-08-04 10:51:16.285107: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
venv/lib/python3.7/site-packages/tensorflow_addons/utils/ensure_tf_install.py:68: UserWarning: Tensorflow Addons supports using Python ops for all Tensorflow versions above or equal to 2.2.0 and strictly below 2.3.0 (nightly versions are not supported).
The versions of TensorFlow you are currently using is 2.3.0 and is not supported.
Some things might work, some things might not.
If you were to encounter a bug, do not file an issue.
If you want to make sure you're using a tested and supported configuration, either change the TensorFlow version or the TensorFlow Addons's version.
You can find the compatibility matrix in TensorFlow Addon's readme:
https://github.com/tensorflow/addons
UserWarning,
Traceback (most recent call last):
File "test.py", line 12, in <module>
train_rl(output_dir='./models/acrobot', train_batch_size=16, eval_batch_size=16, n_epochs=1)
File "venv/lib/python3.7/site-packages/gin/config.py", line 1078, in gin_wrapper
utils.augment_exception_message_and_reraise(e, err_str)
File "venv/lib/python3.7/site-packages/gin/utils.py", line 49, in augment_exception_message_and_reraise
six.raise_from(proxy.with_traceback(exception.__traceback__), None)
File "<string>", line 3, in raise_from
File "venv/lib/python3.7/site-packages/gin/config.py", line 1055, in gin_wrapper
return fn(*new_args, **new_kwargs)
File "venv/lib/python3.7/site-packages/trax/rl_trainer.py", line 99, in train_rl
tf_np.set_allow_float64(FLAGS.tf_allow_float64)
File "venv/lib/python3.7/site-packages/absl/flags/_flagvalues.py", line 491, in __getattr__
raise _exceptions.UnparsedFlagAccessError(error_message)
absl.flags._exceptions.UnparsedFlagAccessError: Trying to access flag --tf_allow_float64 before flags were parsed.
In call to configurable 'train_rl' (<function train_rl at 0x7f6383869b90>)
Process finished with exit code 1
...