deep-neuroevolution icon indicating copy to clipboard operation
deep-neuroevolution copied to clipboard

Building gym_tensorflow

Open Nostrademous opened this issue 6 years ago • 19 comments

Getting the following errors from a fresh git clone following README instructions: SIDE NOTE: I'm installing this on an Ubuntu OS using Windows Subsystem for Linux

(env) nostrademous@DESKTOP-J9431IB:~/ML/deep-neuroevolution/gpu_implementation/gym_tensorflow$ make clean
rm -rf gym_tensorflow.so
(env) nostrademous@DESKTOP-J9431IB:~/ML/deep-neuroevolution/gpu_implementation/gym_tensorflow$ make
g++ -std=c++11 -shared -fPIC -I/home/nostrademous/ML/env/lib/python3.6/site-packages/tensorflow/include -I/home/nostrademous/ML/env/lib/python3.6/site-packages/tensorflow/include/external/nsync/public -L/home/nostrademous/ML/env/lib/python3.6/site-packages/tensorflow/core -D_GLIBCXX_USE_CXX11_ABI=0 -O2 -DGOOGLE_CUDA=1 -Wl,-rpath=/build .//*.cpp .//ops/*.cpp -ltensorflow_framework -o gym_tensorflow.so
In file included from .//tf_env.cpp:22:0:
.//tf_env.cpp: In member function ‘virtual void EnvironmentMakeOp::Compute(tensorflow::OpKernelContext*)’:
.//tf_env.cpp:102:69: error: ‘MakeResourceHandleToOutput’ was not declared in this scope
                                     MakeTypeIndex<BaseEnvironment>()));
                                                                     ^
/home/nostrademous/ML/env/lib/python3.6/site-packages/tensorflow/include/tensorflow/core/framework/op_kernel.h:1309:29: note: in definition of macro ‘OP_REQUIRES_OK’
     ::tensorflow::Status _s(STATUS);    \
                             ^
In file included from /home/nostrademous/ML/env/lib/python3.6/site-packages/tensorflow/include/tensorflow/stream_executor/platform/mutex.h:25:0,
                 from /home/nostrademous/ML/env/lib/python3.6/site-packages/tensorflow/include/tensorflow/stream_executor/dso_loader.h:29,
                 from /home/nostrademous/ML/env/lib/python3.6/site-packages/tensorflow/include/tensorflow/core/platform/default/stream_executor.h:25,
                 from /home/nostrademous/ML/env/lib/python3.6/site-packages/tensorflow/include/tensorflow/core/platform/stream_executor.h:24,
                 from .//ops/indexedmatmul.cpp:20:
/home/nostrademous/ML/env/lib/python3.6/site-packages/tensorflow/include/tensorflow/stream_executor/platform/default/mutex.h:32:19: error: ‘tensorflow::tf_shared_lock’ has not been declared
 using tensorflow::tf_shared_lock;
                   ^
In file included from /home/nostrademous/ML/env/lib/python3.6/site-packages/tensorflow/include/tensorflow/core/platform/default/stream_executor.h:31:0,
                 from /home/nostrademous/ML/env/lib/python3.6/site-packages/tensorflow/include/tensorflow/core/platform/stream_executor.h:24,
                 from .//ops/indexedmatmul.cpp:20:
/home/nostrademous/ML/env/lib/python3.6/site-packages/tensorflow/include/tensorflow/stream_executor/stream.h: In member function ‘bool perftools::gputools::Stream::InErrorState() const’:
/home/nostrademous/ML/env/lib/python3.6/site-packages/tensorflow/include/tensorflow/stream_executor/stream.h:2005:5: error: ‘tf_shared_lock’ was not declared in this scope
     tf_shared_lock lock{mu_};
     ^
Makefile:45: recipe for target 'gym_tensorflow.so' failed
make: *** [gym_tensorflow.so] Error 1

NOTE: I do have slightly update version of some of the python packages, but I don't think that's the errors I'm hitting. Here is the pip list anyways:

(env) nostrademous@DESKTOP-J9431IB:~/ML/deep-neuroevolution/gpu_implementation/gym_tensorflow$ pip list
Package        Version
-------------- -----------
absl-py        0.2.0
appdirs        1.4.3
astor          0.6.2
bleach         1.5.0
click          6.7
gast           0.2.0
grpcio         1.11.0
gym            0.9.4
h5py           2.7.0
html5lib       0.9999999
Markdown       2.6.11
mujoco-py      0.5.7
numpy          1.14.3
packaging      16.8
pip            10.0.1
protobuf       3.5.2.post1
pyglet         1.2.4
PyOpenGL       3.1.0
pyparsing      2.2.0
redis          2.10.5
requests       2.14.2
setuptools     28.8.0
six            1.11.0
tensorboard    1.8.0
tensorflow     0.12.1
tensorflow-gpu 1.8.0
termcolor      1.1.0
Werkzeug       0.14.1
wheel          0.31.0
(env) nostrademous@DESKTOP-J9431IB:~/ML/deep-neuroevolution/gpu_implementation/gym_tensorflow$

Nostrademous avatar May 11 '18 20:05 Nostrademous

I wonder if it has anything do to with the two different tensorflow versions you have installed. Try pip uninstall tensorflow and keep tensorflow-gpu as is

fps7806 avatar May 11 '18 20:05 fps7806

Tried that, no luck.

(env) nostrademous@DESKTOP-J9431IB:~/ML/deep-neuroevolution/gpu_implementation/gym_tensorflow$ make
Traceback (most recent call last):
  File "<string>", line 1, in <module>
AttributeError: module 'tensorflow' has no attribute 'sysconfig'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
AttributeError: module 'tensorflow' has no attribute 'sysconfig'
g++ -std=c++11 -shared -fPIC -I -I/external/nsync/public -L -D_GLIBCXX_USE_CXX11_ABI=0 -O2 -DGOOGLE_CUDA=1 -Wl,-rpath=/build .//*.cpp .//ops/*.cpp -ltensorflow_framework -o gym_tensorflow.so
.//tf_env.cpp:22:49: fatal error: tensorflow/core/framework/op_kernel.h: No such file or directory
compilation terminated.
.//ops/indexedmatmul.cpp:7:42: fatal error: tensorflow/core/framework/op.h: No such file or directory
compilation terminated.
Makefile:45: recipe for target 'gym_tensorflow.so' failed
make: *** [gym_tensorflow.so] Error 1

Nostrademous avatar May 12 '18 13:05 Nostrademous

I did get the gym to compile after reinstalling the latest version of tensorflow (1.8.0) like I have with tensorflow-gpu. The previous 0.12 version was per the top-level requirements pip in the repo which I guess is obsolete.

However, as compile now, I can't get the ga.py or es.py to work b/c apparently even though I compiled the gym without ALE support, those files require ALE support.

(env) nostrademous@DESKTOP-J9431IB:~/ML/deep-neuroevolution/gpu_implementation$ python es.py configurations/es_atari_config.json
/home/nostrademous/ML/env/lib/python3.6/site-packages/h5py/__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
05/12/2018 09:29:42 AM {
    "episode_cutoff_mode": 5000,
    "game": "frostbite",
    "l2coeff": 0.005,
    "model": "ModelVirtualBN",
    "mutation_power": 0.02,
    "num_test_episodes": 200,
    "num_validation_episodes": 30,
    "optimizer": {
        "args": {
            "stepsize": 0.01
        },
        "type": "adam"
    },
    "population_size": 5000,
    "return_proc_mode": "centered_rank",
    "timesteps": 250000000.0
}
05/12/2018 09:29:42 AM Logging to: /tmp/tmp9g4m8tav
Traceback (most recent call last):
  File "es.py", line 293, in <module>
    main(**exp)
  File "es.py", line 148, in main
    worker = ConcurrentWorkers(make_env, Model, batch_size=64)
  File "/home/nostrademous/ML/deep-neuroevolution/gpu_implementation/neuroevolution/concurrent_worker.py", line 135, in __init__
    ref_batch = gym_tensorflow.get_ref_batch(make_env_f, sess, 128)
  File "/home/nostrademous/ML/deep-neuroevolution/gpu_implementation/gym_tensorflow/__init__.py", line 18, in get_ref_batch
    env = make_env_f(1)
  File "es.py", line 147, in make_env
    return gym_tensorflow.make(game=exp["game"], batch_size=b)
  File "/home/nostrademous/ML/deep-neuroevolution/gpu_implementation/gym_tensorflow/__init__.py", line 11, in make
    return StackFramesWrapper(atari.AtariEnv(game, batch_size, *args, **kwargs))
  File "/home/nostrademous/ML/deep-neuroevolution/gpu_implementation/gym_tensorflow/atari/__init__.py", line 8, in __init__
    raise NotImplementedError("gym_tensorflow was not compiled with ALE support.")
NotImplementedError: gym_tensorflow was not compiled with ALE support.
(env) nostrademous@DESKTOP-J9431IB:~/ML/deep-neuroevolution/gpu_implementation$

However, when I try to enable ALE in the Makefile and make the gym I get the following errors:

(env) nostrademous@DESKTOP-J9431IB:~/ML/deep-neuroevolution/gpu_implementation/gym_tensorflow$ make
/home/nostrademous/ML/env/lib/python3.6/site-packages/h5py/__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
/home/nostrademous/ML/env/lib/python3.6/site-packages/h5py/__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
g++ -std=c++11 -shared -fPIC -I/home/nostrademous/ML/env/lib/python3.6/site-packages/tensorflow/include -I/home/nostrademous/ML/env/lib/python3.6/site-packages/tensorflow/include/external/nsync/public -L/home/nostrademous/ML/env/lib/python3.6/site-packages/tensorflow -D_GLIBCXX_USE_CXX11_ABI=0 -O2 -DGOOGLE_CUDA=1 -I/home/nostrademous/ML/deep-neuroevolution/gpu_implementation/gym_tensorflow/atari-py/atari_py/ale_interface/src -I/home/nostrademous/ML/deep-neuroevolution/gpu_implementation/gym_tensorflow/atari-py/atari_py/ale_interface/src/controllers -I/home/nostrademous/ML/deep-neuroevolution/gpu_implementation/gym_tensorflow/atari-py/atari_py/ale_interface/src/os_dependent -I/home/nostrademous/ML/deep-neuroevolution/gpu_implementation/gym_tensorflow/atari-py/atari_py/ale_interface/src/environment -I/home/nostrademous/ML/deep-neuroevolution/gpu_implementation/gym_tensorflow/atari-py/atari_py/ale_interface/src/external -L/home/nostrademous/ML/deep-neuroevolution/gpu_implementation/gym_tensorflow/atari-py/atari_py/ale_interface/build -Wl,-rpath=/home/nostrademous/ML/deep-neuroevolution/gpu_implementation/gym_tensorflow/atari-py/atari_py/ale_interface/build .//*.cpp .//ops/*.cpp .//atari/*.cpp -ltensorflow_framework -lale -o gym_tensorflow.so
.//atari/tf_atari.cpp:3:29: fatal error: ale_interface.hpp: No such file or directory
compilation terminated.
Makefile:45: recipe for target 'gym_tensorflow.so' failed
make: *** [gym_tensorflow.so] Error 1

Nostrademous avatar May 12 '18 13:05 Nostrademous

Hi everyone! I got the same issue, I think it depends on which version of gcc this repository uses to build gpu_implementation.

I found the following references:

https://github.com/Zardinality/TF-deformable-conv/issues/1

https://github.com/tensorflow/tensorflow/issues/15002

Alro10 avatar May 14 '18 18:05 Alro10

The experiments we included are for the Atari games which require ALE support, you can follow these instructions to compile. We are in the process of adding MuJoCo support, but without ALE the only environment available is the hard maze.

fps7806 avatar May 14 '18 19:05 fps7806

Hello, have your problem been solved? I have the same trouble with you....

ylddd avatar Jul 26 '18 21:07 ylddd

Hi, everyone, I met an issue: "g++: error: unrecognized command line option ‘-Wl’", any help?

zhan0903 avatar Oct 14 '18 00:10 zhan0903

Hi, everyone, I met an issue: "g++: error: unrecognized command line option ‘-Wl’", any help?

I'm having the same issue. Did you work it out @zhan0903 ?

benjamin22-314 avatar Nov 20 '18 00:11 benjamin22-314

Hi, everyone, I met an issue: "g++: error: unrecognized command line option ‘-Wl’", any help?

Hi @zhan0903 , I think that issue is from a typo in the 'deep-neuroevolution/gpu_implementation/gym_tensorflow/Makefile'.

line 30 is missing a "," I think it should be FLAGS+= -Wl,-rpath=$(ALE)/build instead of FLAGS+= -Wl -rpath=$(ALE)/build

benjamin22-314 avatar Nov 21 '18 00:11 benjamin22-314

Hi the FLAGS+= -Wl,-rpath=$(ALE)/build does not work. I am still encounter the same error. Have your solved this issue? @Nostrademous @fps7806 @ylddd

youshaox avatar Jan 25 '19 22:01 youshaox

A slight adaptation of the changes suggested by @BenjaminPhillips22 fixed it on my Linux Mint instance: FLAGS+= -Wl,-rpath,$(ALE)/build

Notice there are no spaces, and 1 extra comma.

matthewzar avatar Jan 26 '19 22:01 matthewzar

Hi, everyone, I met an issue: "g++: error: unrecognized command line option ‘-Wl’", any help?

Hi @zhan0903 , I think that issue is from a typo in the 'deep-neuroevolution/gpu_implementation/gym_tensorflow/Makefile'.

line 30 is missing a "," I think it should be FLAGS+= -Wl,-rpath=$(ALE)/build instead of FLAGS+= -Wl -rpath=$(ALE)/build

Compile successful on Ubuntu 16.04, Thanks!

lisun-ai avatar Mar 09 '19 09:03 lisun-ai

@Nostrademous I have changed it to "FLAGS+= -Wl,-rpath=$(ALE)/build" and successfully make the gym_tensorflow. But have you guys solved "gym_tensorflow was not compiled with ALE support" error? I have been stuck here for a long time.

Error log:

Traceback (most recent call last): File "es.py", line 293, in main(**exp) File "es.py", line 148, in main worker = ConcurrentWorkers(make_env, Model, batch_size=64) File "/home/shawn/workspace/test/deep-neuroevolution/gpu_implementation/neuroevolution/concurrent_worker.py", line 135, in init ref_batch = gym_tensorflow.get_ref_batch(make_env_f, sess, 128) File "/home/shawn/workspace/test/deep-neuroevolution/gpu_implementation/gym_tensorflow/init.py", line 18, in get_ref_batch env = make_env_f(1) File "es.py", line 147, in make_env return gym_tensorflow.make(game=exp["game"], batch_size=b) File "/home/shawn/workspace/test/deep-neuroevolution/gpu_implementation/gym_tensorflow/init.py", line 11, in make return StackFramesWrapper(atari.AtariEnv(game, batch_size, *args, **kwargs)) File "/home/shawn/workspace/test/deep-neuroevolution/gpu_implementation/gym_tensorflow/atari/init.py", line 8, in init raise NotImplementedError("gym_tensorflow was not compiled with ALE support.") NotImplementedError: gym_tensorflow was not compiled with ALE support.

youshaox avatar Apr 07 '19 08:04 youshaox

@Nostrademous @youshaox I got the same problem "gym_tensorflow was not compiled with ALE support" error. Have you ever solved this problem?

denis-xiao avatar Apr 26 '19 03:04 denis-xiao

@Nostrademous @youshaox I got the same problem "gym_tensorflow was not compiled with ALE support" error. Have you ever solved this problem?

That can be solved if you enable USE_ALE option: https://github.com/uber-research/deep-neuroevolution/blob/master/gpu_implementation/gym_tensorflow/Makefile#L2

Instructions to use ALE are here: https://github.com/uber-research/deep-neuroevolution/tree/master/gpu_implementation/gym_tensorflow/atari

fps7806 avatar Apr 26 '19 17:04 fps7806

I have already set USE_ALE=1 in the file "deep-neuroevolution/gpu_implementation/gym_tensorflow/Makefile". USE_SDL := 0 USE_ALE := 1 USE_GPU := 1

Still, i get the above error.

Following the instructions in https://github.com/uber-research/deep-neuroevolution/tree/master/gpu_implementation/gym_tensorflow/atari:

  1. git clone https://github.com/fps7806/atari-py.git into the directory "deep-neuroevolution/gpu_implementation/gym_tensorflow".
  2. cd ./atari-py && make
  3. set USE_ALE := 1 in the file "deep-neuroevolution/gpu_implementation/gym_tensorflow/Makefile".
  4. cd ./gym_tensorflow && make
  5. python es.py configurations/es_atari_config.json I still get the above error.

Error log:

2019-04-27 08:02:27.225223: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8790 MB memory) -> physical GPU (device: 0, name: Tesla K40c, pci bus id: 0000:03:00.0, compute capability: 3.5) Traceback (most recent call last): File "es.py", line 293, in main(**exp) File "es.py", line 148, in main worker = ConcurrentWorkers(make_env, Model, batch_size=64) File "/home/shawn/workspace/research/deep-neuroevolution/gpu_implementation/neuroevolution/concurrent_worker.py", line 135, in init ref_batch = gym_tensorflow.get_ref_batch(make_env_f, sess, 128) File "/home/shawn/workspace/research/deep-neuroevolution/gpu_implementation/gym_tensorflow/init.py", line 18, in get_ref_batch env = make_env_f(1) File "es.py", line 147, in make_env return gym_tensorflow.make(game=exp["game"], batch_size=b) File "/home/shawn/workspace/research/deep-neuroevolution/gpu_implementation/gym_tensorflow/init.py", line 11, in make return StackFramesWrapper(atari.AtariEnv(game, batch_size, *args, **kwargs)) File "/home/shawn/workspace/research/deep-neuroevolution/gpu_implementation/gym_tensorflow/atari/init.py", line 8, in init raise NotImplementedError("gym_tensorflow was not compiled with ALE support.") NotImplementedError: gym_tensorflow was not compiled with ALE support.

youshaox avatar Apr 26 '19 22:04 youshaox

I have already set USE_ALE=1 in the file "deep-neuroevolution/gpu_implementation/gym_tensorflow/Makefile". USE_SDL := 0 USE_ALE := 1 USE_GPU := 1

Still get the above error.

Interesting, can you try running cd ./gym_tensorflow && make clean && make

fps7806 avatar Apr 26 '19 22:04 fps7806

Running the python ga.py -c configurations/ga_atari_config.json -o out gives the following error. I tried most of the suggestions discussed above.

tensorflow.python.framework.errors_impl.NotFoundError: /home/administrator/Hands-on-Neuroevolution-with-Python/Chapter10/gym_tensorflow/gym_tensorflow.so: undefined symbol: ZN10tensorflow11ResourceMgr8DoDeleteERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt10type_indexS8

vijnasu avatar Feb 20 '20 11:02 vijnasu

Hi, everyone, I met an issue: "g++: error: unrecognized command line option ‘-Wl’", any help?

Hi @zhan0903 , I think that issue is from a typo in the 'deep-neuroevolution/gpu_implementation/gym_tensorflow/Makefile'.

line 30 is missing a "," I think it should be FLAGS+= -Wl,-rpath=$(ALE)/build instead of FLAGS+= -Wl -rpath=$(ALE)/build

I am still having some trouble with the same error. Does somebody know how to resolve it?

thisisjasleen avatar Feb 24 '20 07:02 thisisjasleen