grok-1 icon indicating copy to clipboard operation
grok-1 copied to clipboard

"python run.py" mistake,Pls how to fix it

Open dAItime001 opened this issue 1 year ago • 2 comments

INFO:jax._src.xla_bridge:Unable to initialize backend 'cuda': Found cuBLAS version 120001, but JAX was built against version 120304, which is newer. The copy of cuBLAS that is installed must be at least as new as the version against which JAX was built. INFO:jax._src.xla_bridge:Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: CUDA INFO:jax._src.xla_bridge:Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory WARNING:jax._src.xla_bridge:CUDA backend failed to initialize: Found cuBLAS version 120001, but JAX was built against version 120304, which is newer. The copy of cuBLAS that is installed must be at least as new as the version against which JAX was built. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.) INFO:rank:Initializing mesh for self.local_mesh_config=(1, 8) self.between_hosts_config=(1, 1)... INFO:rank:Detected 1 devices in mesh Traceback (most recent call last): File "/dev/grok-1/run.py", line 72, in main() File "/dev/grok-1/run.py", line 63, in main inference_runner.initialize() File "/dev/grok-1/runners.py", line 282, in initialize runner.initialize( File "/dev/grok-1/runners.py", line 181, in initialize self.mesh = make_mesh(self.local_mesh_config, self.between_hosts_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/dev/grok-1/runners.py", line 586, in make_mesh device_mesh = mesh_utils.create_hybrid_device_mesh( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/xai/lib/python3.12/site-packages/jax/experimental/mesh_utils.py", line 373, in create_hybrid_device_mesh per_granule_meshes = [create_device_mesh(mesh_shape, granule) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/xai/lib/python3.12/site-packages/jax/experimental/mesh_utils.py", line 302, in create_device_mesh raise ValueError(f'Number of devices {len(devices)} must equal the product ' ValueError: Number of devices 1 must equal the product of mesh_shape (1, 8)

dAItime001 avatar Mar 26 '24 02:03 dAItime001

See https://github.com/xai-org/grok-1/blob/7050ed204b8206bb8645c7b7bbef7252f79561b0/run.py#L60

and change local_mesh_config=(1, 1)

zcobol avatar Mar 26 '24 04:03 zcobol

Not an issue

Aareon avatar Mar 27 '24 04:03 Aareon