petals
petals copied to clipboard
Mac M3 Any Model crashing
I am able to run python -m petals.cli.run_server meta-llama/Meta-Llama-3.1-405B-Instruct --num_blocks 2 --max_disk_space=50G
for a bit but it always eventually exits with the an AssertionError: Span served by this server is not present in the DHT
.
System info:
Apple M3
16 GB
pyenv python v3.12
zsh
Installation method: pipx install --python=${HOME}/.pyenv/versions/3.12.2/bin/python petals
Other errors in the stdout:
${HOME}/.pyenv/versions/3.12.2/lib/python3.12/site-packages/google/protobuf/runtime_version.py:112: UserWarning: Protobuf gencode version 5.27.2 is older than the runtime version 5.28.0 at runtime.proto. Please avoid checked-in Protobuf gencode that can be obsolete.
warnings.warn(
${HOME}/.pyenv/versions/3.12.2/lib/python3.12/site-packages/google/protobuf/runtime_version.py:112: UserWarning: Protobuf gencode version 5.27.2 is older than the runtime version 5.28.0 at crypto.proto. Please avoid checked-in Protobuf gencode that can be obsolete.
warnings.warn(
${HOME}/.pyenv/versions/3.12.2/lib/python3.12/site-packages/google/protobuf/runtime_version.py:112: UserWarning: Protobuf gencode version 5.27.2 is older than the runtime version 5.28.0 at p2pd.proto. Please avoid checked-in Protobuf gencode that can be obsolete.
warnings.warn(
${HOME}/.pyenv/versions/3.12.2/lib/python3.12/site-packages/google/protobuf/runtime_version.py:112: UserWarning: Protobuf gencode version 5.27.2 is older than the runtime version 5.28.0 at averaging.proto. Please avoid checked-in Protobuf gencode that can be obsolete.
warnings.warn(
${HOME}/.pyenv/versions/3.12.2/lib/python3.12/site-packages/google/protobuf/runtime_version.py:112: UserWarning: Protobuf gencode version 5.27.2 is older than the runtime version 5.28.0 at dht.proto. Please avoid checked-in Protobuf gencode that can be obsolete.
warnings.warn(
${HOME}/.pyenv/versions/3.12.2/lib/python3.12/site-packages/google/protobuf/runtime_version.py:112: UserWarning: Protobuf gencode version 5.27.2 is older than the runtime version 5.28.0 at auth.proto. Please avoid checked-in Protobuf gencode that can be obsolete.
warnings.warn(
Sep 11 17:16:48.795 [INFO] Running Petals 2.3.0.dev2
Sep 11 17:16:49.602 [INFO] Make sure you follow the Llama terms of use: https://llama.meta.com/llama3/license, https://llama.meta.com/llama2/license
Sep 11 17:16:49.602 [INFO] Using DHT prefix: Meta-Llama-3-1-405B-Instruct-hf
Sep 11 17:17:09.101 [INFO] This server is accessible via relays
Sep 11 17:17:13.146 [INFO] Connecting to the public swarm
Sep 11 17:17:13.147 [INFO] Running a server on <REDACTED>
Sep 11 17:17:13.164 [WARN] [petals.server.server.__init__:178] Type bfloat16 is not supported on MPS, using float16 instead
Sep 11 17:17:13.164 [INFO] Model weights are loaded in float16 format
Sep 11 17:17:13.165 [INFO] Attention cache for all blocks will consume up to 0.12 GiB
Sep 11 17:17:13.165 [INFO] Loading throughput info
Sep 11 17:17:13.166 [INFO] Reporting throughput: 13.5 tokens/sec for 2 blocks
Sep 11 17:17:17.462 [INFO] Announced that blocks [0, 1] are joining
Sep 11 17:17:28.173 [INFO] Loaded meta-llama/Meta-Llama-3.1-405B-Instruct block 0
'NoneType' object has no attribute 'cadam32bit_grad_fp32'
Sep 11 17:17:52.718 [INFO] Loaded meta-llama/Meta-Llama-3.1-405B-Instruct block 1
Sep 11 17:18:06.367 [INFO] Detected a NAT or a firewall, connecting to libp2p relays. This takes a few minutes
Sep 11 17:36:45.452 [WARN] [petals.server.reachability.validate_reachability:40] Skipping reachability check because health.petals.dev is down: ReadTimeout(ReadTimeoutError("HTTPSConnectionPool(host='health.petals.dev', port=443): Read timed out. (read timeout=10)"))
Sep 11 17:36:45.707 [INFO] Started
Sep 11 17:44:13.840 [INFO] Announced that blocks ['Meta-Llama-3-1-405B-Instruct-hf.0', 'Meta-Llama-3-1-405B-Instruct-hf.1'] are offline
Sep 11 17:44:13.948 [INFO] Shutting down
Sep 11 17:44:13.952 [INFO] Module container shut down successfully
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "${HOME}/.pyenv/versions/3.12.2/lib/python3.12/site-packages/petals/cli/run_server.py", line 235, in <module>
main()
File "${HOME}/.pyenv/versions/3.12.2/lib/python3.12/site-packages/petals/cli/run_server.py", line 227, in main
server.run()
File "${HOME}/.pyenv/versions/3.12.2/lib/python3.12/site-packages/petals/server/server.py", line 378, in run
if self._should_choose_other_blocks():
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "${HOME}/.pyenv/versions/3.12.2/lib/python3.12/site-packages/petals/server/server.py", line 418, in _should_choose_other_blocks
return block_selection.should_choose_other_blocks(self.dht.peer_id, module_infos, self.balance_quality)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "${HOME}/.pyenv/versions/3.12.2/lib/python3.12/site-packages/petals/server/block_selection.py", line 51, in should_choose_other_blocks
assert local_peer_id in spans, "Span served by this server is not present in the DHT"
AssertionError: Span served by this server is not present in the DHT