openpi icon indicating copy to clipboard operation
openpi copied to clipboard

error occurs when run remote inference

Open Addog666 opened this issue 1 month ago • 2 comments

I use the command:

uv run scripts/serve_policy.py policy:checkpoint --policy.config=pi05_libero --policy.dir=gs://openpi-assets/checkpoints/pi0_base

but it throws an error:

INFO:root:Loading model... INFO:2025-10-21 05:46:22,801:jax._src.xla_bridge:925: Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig' INFO:jax._src.xla_bridge:Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig' INFO:2025-10-21 05:46:22,801:jax._src.xla_bridge:925: Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory INFO:jax._src.xla_bridge:Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory INFO:absl:orbax-checkpoint version: 0.11.13 INFO:absl:Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7fccf0f73ad0> INFO:absl:Restoring checkpoint from /home/wjg/.cache/openpi/openpi-assets/checkpoints/pi0_base/params. INFO:absl:[thread=MainThread] Failed to get flag value for EXPERIMENTAL_ORBAX_USE_DISTRIBUTED_PROCESS_ID. INFO:absl:[process=0][thread=MainThread] No metadata found for any process_index, checkpoint_dir=/home/wjg/.cache/openpi/openpi-assets/checkpoints/pi0_base/params. time elapsed=0.00042700767517089844 seconds. If the checkpoint does not contain jax.Array then it is expected. If checkpoint contains jax.Array then it should lead to an error eventually; if no error is raised then it is a bug. INFO:absl:[process=0] /jax/checkpoint/read/bytes_per_sec: 3.7 GiB/s (total bytes: 48.3 GiB) (time elapsed: 13 seconds) (per-host) INFO:absl:Finished restoring checkpoint in 13.18 seconds from /home/wjg/.cache/openpi/openpi-assets/checkpoints/pi0_base/params. Traceback (most recent call last): File "/data/wjg_files/openpi/scripts/serve_policy.py", line 122, in main(tyro.cli(Args)) File "/data/wjg_files/openpi/scripts/serve_policy.py", line 100, in main policy = create_policy(args) ^^^^^^^^^^^^^^^^^^^ File "/data/wjg_files/openpi/scripts/serve_policy.py", line 92, in create_policy return _policy_config.create_trained_policy( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/wjg_files/openpi/src/openpi/policies/policy_config.py", line 57, in create_trained_policy model = train_config.model.load(_model.restore_params(checkpoint_dir / "params", dtype=jnp.bfloat16)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/wjg_files/openpi/src/openpi/models/model.py", line 239, in load at.check_pytree_equality(expected=state.to_pure_dict(), got=params, check_shapes=True, check_dtypes=False) File "/data/wjg_files/openpi/src/openpi/shared/array_typing.py", line 70, in check_pytree_equality raise ValueError( ValueError: PyTrees have different structure:

  • at keypath '': expected <class 'dict'> with 5 children, got <class 'dict'> with 3 children, so the numbers of children do not match, with the symmetric difference of key sets: {'time_mlp_out' 'time_mlp_in'}.

how can i solve it?

Addog666 avatar Oct 21 '25 05:10 Addog666