tensorforce icon indicating copy to clipboard operation
tensorforce copied to clipboard

GPU integration for MacOs12.3 M1 Max

Open doric35 opened this issue 3 years ago • 0 comments

I ran the Quickstart.py example, and I get the following error;

Metal device set to: Apple M1 Max

systemMemory: 32.00 GB maxCacheSize: 10.67 GB

WARNING:root:Infinite min_value bound for state. Episodes: 0%| | 0/200 [00:00, return=0.00, ts/ep=0, sec/ep=0.00, ms/ts=0.0, agent=0.0%]Traceback (most recent call last): File "/Users/dominikrichard/workspace/minesweeping/minesweepingpython/main/tensforce_testing.py", line 53, in main() File "/Users/dominikrichard/workspace/minesweeping/minesweepingpython/main/tensforce_testing.py", line 46, in main runner.run(num_episodes=200) File "/opt/homebrew/anaconda3/envs/TensFenv/lib/python3.10/site-packages/tensorforce/execution/runner.py", line 649, in run self.handle_act(parallel=n) File "/opt/homebrew/anaconda3/envs/TensFenv/lib/python3.10/site-packages/tensorforce/execution/runner.py", line 697, in handle_act actions = self.agent.act(states=self.states[parallel], parallel=parallel) File "/opt/homebrew/anaconda3/envs/TensFenv/lib/python3.10/site-packages/tensorforce/agents/agent.py", line 415, in act return super().act( File "/opt/homebrew/anaconda3/envs/TensFenv/lib/python3.10/site-packages/tensorforce/agents/recorder.py", line 262, in act actions, internals = self.fn_act( File "/opt/homebrew/anaconda3/envs/TensFenv/lib/python3.10/site-packages/tensorforce/agents/agent.py", line 462, in fn_act actions, timesteps = self.model.act( File "/opt/homebrew/anaconda3/envs/TensFenv/lib/python3.10/site-packages/tensorforce/core/module.py", line 136, in decorated output_args = function_graphsstr(graph_params) File "/opt/homebrew/anaconda3/envs/TensFenv/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler raise e.with_traceback(filtered_tb) from None File "/opt/homebrew/anaconda3/envs/TensFenv/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 54, in quick_execute tensors = pywrap_tfe.TFE_Py_Execute(ctx.handle, device_name, op_name, tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation agent/VerifyFinite/CheckNumerics: Could not satisfy explicit device specification '' because the node {{colocation_node agent/VerifyFinite/CheckNumerics}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0]. Colocation Debug Info: Colocation group had the following types and supported devices: Root Member(assigned_device_name_index=1 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[] Identity: GPU CPU Switch: GPU CPU CheckNumerics: CPU _Arg: GPU CPU

Colocation members, user-requested devices, and framework assigned devices, if any: args_0 (_Arg) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0 agent/VerifyFinite/CheckNumerics (CheckNumerics) agent/VerifyFinite/control_dependency (Identity) agent/assert_greater_equal/Assert/AssertGuard/args_0/_16 (Switch) agent/assert_less_equal/Assert/AssertGuard/args_0/_26 (Switch) Func/agent/StatefulPartitionedCall/input/_80 (Identity) /job:localhost/replica:0/task:0/device:GPU:0 Func/agent/assert_greater_equal/Assert/AssertGuard/then/_10/input/_153 (Identity) Func/agent/assert_greater_equal/Assert/AssertGuard/else/_11/input/_159 (Identity) Func/agent/assert_less_equal/Assert/AssertGuard/then/_20/input/_165 (Identity) Func/agent/assert_less_equal/Assert/AssertGuard/else/_21/input/_171 (Identity) Func/agent/StatefulPartitionedCall/state_preprocessing/PartitionedCall/input/_260 (Identity) /job:localhost/replica:0/task:0/device:GPU:0 Func/agent/StatefulPartitionedCall/state_preprocessing/PartitionedCall/linear_normalization0/PartitionedCall/input/_356 (Identity) /job:localhost/replica:0/task:0/device:GPU:0

     [[{{node agent/VerifyFinite/CheckNumerics}}]] [Op:__inference_act_1848]

Episodes: 0%| | 0/200 [00:00, return=0.00, ts/ep=0, sec/ep=0.00, ms/ts=0.0, agent=0.0%]


I installed Tensorforce using this guide; https://tensorforce.readthedocs.io/en/latest/basics/installation.html

for M1 Mac in a new Conda environment. I also had to upgrade numpy to 1.22 to run the code.

My Conda env is build as follow;

Name Version Build Channel

absl-py 1.2.0 pypi_0 pypi astunparse 1.6.3 pypi_0 pypi blas 1.0 openblas
bzip2 1.0.8 h620ffc9_4
c-ares 1.18.1 h1a28f6b_0
ca-certificates 2022.07.19 hca03da5_0
cachetools 5.2.0 pypi_0 pypi certifi 2022.6.15 py310hca03da5_0
charset-normalizer 2.1.0 pypi_0 pypi cloudpickle 2.1.0 pypi_0 pypi cycler 0.11.0 pypi_0 pypi flatbuffers 1.12 pypi_0 pypi fonttools 4.34.4 pypi_0 pypi gast 0.4.0 pypi_0 pypi google-auth 2.10.0 pypi_0 pypi google-auth-oauthlib 0.4.6 pypi_0 pypi google-pasta 0.2.0 pypi_0 pypi grpcio 1.42.0 py310h95c9599_0
gym 0.21.0 pypi_0 pypi h5py 3.6.0 py310h181c318_0
hdf5 1.12.1 h160e8cb_2
idna 3.3 pypi_0 pypi keras 2.9.0 pypi_0 pypi keras-preprocessing 1.1.2 pypi_0 pypi kiwisolver 1.4.4 pypi_0 pypi krb5 1.19.2 h3b8d789_0
libclang 14.0.6 pypi_0 pypi libcurl 7.84.0 hc6d1d07_0
libcxx 12.0.0 hf6beb65_1
libedit 3.1.20210910 h1a28f6b_0
libev 4.33 h1a28f6b_1
libffi 3.4.2 hc377ac9_4
libgfortran 5.0.0 11_2_0_he6877d6_26
libgfortran5 11.2.0 he6877d6_26
libnghttp2 1.46.0 h95c9599_0
libopenblas 0.3.20 hea475bc_0
libssh2 1.10.0 hf27765b_0
llvm-openmp 12.0.0 haf9daa7_1
markdown 3.4.1 pypi_0 pypi markupsafe 2.1.1 pypi_0 pypi matplotlib 3.5.1 pypi_0 pypi msgpack 1.0.3 pypi_0 pypi msgpack-numpy 0.4.7.1 pypi_0 pypi ncurses 6.3 h1a28f6b_3
numpy 1.22.0 pypi_0 pypi oauthlib 3.2.0 pypi_0 pypi openssl 1.1.1q h1a28f6b_0
opt-einsum 3.3.0 pypi_0 pypi packaging 21.3 pypi_0 pypi pillow 9.2.0 pypi_0 pypi pip 22.1.2 py310hca03da5_0
protobuf 3.19.4 pypi_0 pypi pyasn1 0.4.8 pypi_0 pypi pyasn1-modules 0.2.8 pypi_0 pypi pyparsing 3.0.9 pypi_0 pypi python 3.10.4 hbdb9e5c_0
python-dateutil 2.8.2 pypi_0 pypi readline 8.1.2 h1a28f6b_1
requests 2.28.1 pypi_0 pypi requests-oauthlib 1.3.1 pypi_0 pypi rsa 4.9 pypi_0 pypi setuptools 61.2.0 py310hca03da5_0
six 1.15.0 pypi_0 pypi sqlite 3.39.2 h1058600_0
tensorboard 2.9.1 pypi_0 pypi tensorboard-data-server 0.6.1 pypi_0 pypi tensorboard-plugin-wit 1.8.1 pypi_0 pypi tensorflow-deps 2.8.0 0 apple tensorflow-estimator 2.9.0 pypi_0 pypi tensorflow-macos 2.9.2 pypi_0 pypi tensorflow-metal 0.5.0 pypi_0 pypi tensorforce 0.6.5 pypi_0 pypi termcolor 1.1.0 pypi_0 pypi tk 8.6.12 hb8d0fd4_0
tqdm 4.62.3 pypi_0 pypi typing-extensions 4.3.0 pypi_0 pypi tzdata 2022a hda174b7_0
urllib3 1.26.11 pypi_0 pypi werkzeug 2.2.2 pypi_0 pypi wheel 0.37.1 pyhd3eb1b0_0
wrapt 1.14.1 pypi_0 pypi xz 5.2.5 h1a28f6b_1
zlib 1.2.12 h5a0b063_2


Is there any way to dares this issue? I also tried downgrading python to 3.9 with did not work. Is Mac OS not supposed to be supported using TensorFlow-metal?

Thank you.

doric35 avatar Aug 11 '22 16:08 doric35