ManiSkill Bounds for action space

Hi, thank you for this great work.

In Tensorflow version of your dataset (maniskill_dataset_converted_externally_to_rlds, https://www.tensorflow.org/datasets/catalog/maniskill_dataset_converted_externally_to_rlds)

what is the upper and lower bounds for the 7 parameters in steps/action? Also, do these 7 parameters mean, where the gripper will be in the next step?

Great thanks in advance.

Mar 29 '24 10:03 RussRobin

Actually the dataset is converted to part of the "Open-X-embodiment" dataset. The action is unbounded. Action [:3] is xyz movement of tool-center point (tcp, or center between two Panda gripper fingers), action [3:6] is axis-angle movement of tcp, axis[6:7] is gripper open/close. 0.1 times xyz value maps to meters, and 0.1 times axis-angle maps to rotation at the same axis but norm in radians (due to the controller configuration below)

For meaning of action[:6], it actually corresponds to a controller config not yet included in the original ManiSkill2 implementation (agents/configs/panda/defaults.py), for the "base-frame conformity" with other Open-X-Embodiment datasets:

arm_pd_base_ee_delta_pose = PDEEPoseControllerConfig(
            self.arm_joint_names,
            -0.1,
            0.1,
            0.1,
            self.arm_stiffness,
            self.arm_damping,
            self.arm_force_limit,
            ee_link=self.ee_link_name,
            frame="base",
        )

Which means the action[:6] is the "target delta pose" of tcp with respect to the base frame of the robot. A caveat is that for ManiSkill2, the tcp target pose after applying the action is sapien.core.Pose(p=scale * action[:3], q=quaternion_multiplication(axangle2quat(scale * action[3:6]), robot_tcp_pose_wrt_base)) (where scale=0.1, robot_tcp_pose_wrt_base = robot.pose.inv() * robot.tcp.pose, and quaternion_multiplication(a, b) = sapien.core.Pose(q=a) * sapien.core.Pose(q=b)), while for other subsets of Open-X-Embodiment datasets it is mostly sapien.core.Pose(p=robot_tcp_pose_wrt_base.p + (potentially scaled) action[:3], q=quaternion_multiplication(euler2quat(*(potentially scaled action[3:6])), robot_tcp_pose_wrt_base.q)). This means that in the current tfds maniskill dataset, we don't decouple translation and rotation actions when calculating the new tcp pose, while for other parts of the open-x-embodiment dataset they decouple it.

Mar 29 '24 17:03 xuanlinli17

Thank you for your timely and detailed reply! In RT-X, it is noted that, action[0:6] are tokenized by dividing each into 256 bins, which uniformly distribute along each dimensions of action[0:6]. To do so, one will need the upper and lower bound of each element in action[0:6]. May you please provide more info about this? Great thanks!

Mar 30 '24 01:03 RussRobin

RT-X didn't use Maniskill data to train (it used a subset of real-world data only). There is no upper and lower bound for ManiSkill action[:6], but they are typically within -1 to 1.

Mar 30 '24 02:03 xuanlinli17

Thanks a lot for your help. Ill close the issue.

Mar 30 '24 02:03 RussRobin

Hi @xuanlinli17 , I wonder if it is possible to:

train a model on tfds (rtx) version of maniskill and test it in Maniskill2. Currently, I'm using

camera_cfgs = dict(width=256, height=256)
env = gym.make(
            "PickCube-v0", 
            # num_envs=1,
            obs_mode="rgbd", 
            control_mode="pd_ee_delta_pose", 
            render_mode="human",
            camera_cfgs=camera_cfgs
        )
        action = np.array([control_signal['dx'],control_signal['dy'],control_signal['dz'],control_signal['droll'],control_signal['dpitch'],control_signal['dyaw'],control_signal['gripper']])
obs, _, terminated, truncated, _ = env.step(action)

My model outputs xyz, rpy, with the same format in tfds version. It performs very well on my split of tfds train/val set but is doing really bad in maniskill2 virtual environment. So i think this is because the configuration in my maniskill2 virtual env and tfds dataset are different. Is there any possible ways for me to reproduce the env in tfds dataset?

How can I generate datasets with same control mode in maniskill tfds version. i.e. can I reproduce tfds dataset generation process?

A lot of thanks!

May 27 '24 08:05 RussRobin

what is this tfds dataset? If it's one of the ManiSkill2 datasets I can maybe recreate them for ManiSkill 3 for use

May 28 '24 01:05 StoneT2000

Thanks for your reply. I’m referring to Rex version: https://www.tensorflow.org/datasets/catalog/maniskill_dataset_converted_externally_to_rlds

I tried to automatically generate trajectories in maniskill3, and then convert it from joint positions to pd-ee-delta-pose, but the maniskill environment doesn’t support this conversion yet. Also, in this issue, it’s noted that : This means that in the current tfds maniskill dataset, we don't decouple translation and rotation actions when calculating the new tcp pose, while for other parts of the open-x-embodiment dataset they decouple it.

I wonder if there’s any ways to reproduce maniskill-rtx version. Or can I generate of-ee-delta-pose datasets in maniskill by myself, train on it, and test models in maniskill?

Great thanks!

May 28 '24 01:05 RussRobin

As other people decouple them explicitly, maniskill has also shifted to decoupling them by default. I will get back to you on how to regenerate the datasets there later, do you specifically want the tfds format?

May 29 '24 22:05 StoneT2000

Thanks! Yes, I’m interest to have the same control mode in rtx-style training data and maniskill virtual environment: xyz, rpy and gripper signals are good for me. Thanks again for your help!

May 30 '24 00:05 RussRobin

Do you know which frame of the xyz rpy signals are needed? We support a few options, I can prioritize trying to get the data for the one you need

Jun 01 '24 01:06 StoneT2000

Thanks! Sorry but what do you mean by frame? I’m interested in xyz rpy of end-effector, with all setting the same as pd-ee-delta-pose control mode, to do tasks in manidkills like picking up, moving charger, …

Jun 01 '24 01:06 RussRobin

So end effector control has multiple ways of controlling XYZ and RPY.

Eg +z only and not x or y can lead to different results if you pick the robot base frame or the end effector frame.

See here for more details and videos of how they work: https://maniskill.readthedocs.io/en/latest/user_guide/concepts/controllers.html#pd-ee-end-effector-pose

Jun 01 '24 02:06 StoneT2000

I think root frame should be good for me. Great thanks!

Jun 01 '24 02:06 RussRobin

The conversion tool is added back @RussRobin via #388 however the success rate is not guaranteed to be very high. For example a task like PegInsertionSide conversion success rate is about 40%. So you would have to generate a ton of demonstrations and then convert them.

I have also uploaded demos for various datasets in the pd joint position format so you can run the tool to convert it as done before. Tasks with highly precise manipulation will be difficult to convert.

I do have 1 other solution which is to use a hybrid action-free imitation learning + reinforcement learning approach to learn from the pd joint position demonstrations + use RL to learn pd delta ee pose controls. I will probably write some code for this approach (this uses our env state reset based online imitation learning solution called RFCL which can do action free learning)

Jul 01 '24 15:07 StoneT2000

转换工具是通过 #388 添加回来的，但不能保证成功率非常高。例如，像 PegInsertionSide 这样的任务，转化率约为 40%。因此，您必须生成大量演示，然后转换它们。

我还以 pd 关节位置格式上传了各种数据集的演示，以便您可以像以前一样运行该工具进行转换。具有高精度作的任务将难以转换。

我确实有 1 个其他解决方案，即使用混合无动作模仿学习 + 强化学习方法从 pd 关节位置演示中学习 + 使用 RL 学习 pd delta ee 姿势控制。我可能会为这种方法编写一些代码（它使用我们基于 env state reset 的在线模仿学习解决方案，称为 RFCL，它可以进行无动作学习）

您好，请问这是什么意思：使用混合无动作模仿学习 + 强化学习方法从 pd 关节位置演示中学习 + 使用 RL 学习 pd delta ee 姿势控制。是否有一些代码可以让我理解一下，我对此很感兴趣

Apr 22 '25 08:04 103240

See my labs paper on this: https://reverseforward-cl.github.io/

Apr 23 '25 06:04 StoneT2000

ManiSkill ManiSkill copied to clipboard

Bounds for action space

ManiSkill
ManiSkill copied to clipboard