ManiSkill
ManiSkill copied to clipboard
Bounds for action space
Hi, thank you for this great work.
In Tensorflow version of your dataset (maniskill_dataset_converted_externally_to_rlds, https://www.tensorflow.org/datasets/catalog/maniskill_dataset_converted_externally_to_rlds)
what is the upper and lower bounds for the 7 parameters in steps/action? Also, do these 7 parameters mean, where the gripper will be in the next step?
Great thanks in advance.
Actually the dataset is converted to part of the "Open-X-embodiment" dataset. The action is unbounded. Action [:3] is xyz movement of tool-center point (tcp, or center between two Panda gripper fingers), action [3:6] is axis-angle movement of tcp, axis[6:7] is gripper open/close. 0.1 times xyz value maps to meters, and 0.1 times axis-angle maps to rotation at the same axis but norm in radians (due to the controller configuration below)
For meaning of action[:6], it actually corresponds to a controller config not yet included in the original ManiSkill2 implementation (agents/configs/panda/defaults.py), for the "base-frame conformity" with other Open-X-Embodiment datasets:
arm_pd_base_ee_delta_pose = PDEEPoseControllerConfig(
self.arm_joint_names,
-0.1,
0.1,
0.1,
self.arm_stiffness,
self.arm_damping,
self.arm_force_limit,
ee_link=self.ee_link_name,
frame="base",
)
Which means the action[:6] is the "target delta pose" of tcp with respect to the base frame of the robot. A caveat is that for ManiSkill2, the tcp target pose after applying the action is sapien.core.Pose(p=scale * action[:3], q=quaternion_multiplication(axangle2quat(scale * action[3:6]), robot_tcp_pose_wrt_base)) (where scale=0.1, robot_tcp_pose_wrt_base = robot.pose.inv() * robot.tcp.pose, and quaternion_multiplication(a, b) = sapien.core.Pose(q=a) * sapien.core.Pose(q=b)), while for other subsets of Open-X-Embodiment datasets it is mostly sapien.core.Pose(p=robot_tcp_pose_wrt_base.p + (potentially scaled) action[:3], q=quaternion_multiplication(euler2quat(*(potentially scaled action[3:6])), robot_tcp_pose_wrt_base.q)). This means that in the current tfds maniskill dataset, we don't decouple translation and rotation actions when calculating the new tcp pose, while for other parts of the open-x-embodiment dataset they decouple it.
Thank you for your timely and detailed reply! In RT-X, it is noted that, action[0:6] are tokenized by dividing each into 256 bins, which uniformly distribute along each dimensions of action[0:6]. To do so, one will need the upper and lower bound of each element in action[0:6]. May you please provide more info about this? Great thanks!
RT-X didn't use Maniskill data to train (it used a subset of real-world data only). There is no upper and lower bound for ManiSkill action[:6], but they are typically within -1 to 1.
Thanks a lot for your help. Ill close the issue.
Hi @xuanlinli17 , I wonder if it is possible to:
- train a model on tfds (rtx) version of maniskill and test it in Maniskill2. Currently, I'm using
camera_cfgs = dict(width=256, height=256)
env = gym.make(
"PickCube-v0",
# num_envs=1,
obs_mode="rgbd",
control_mode="pd_ee_delta_pose",
render_mode="human",
camera_cfgs=camera_cfgs
)
action = np.array([control_signal['dx'],control_signal['dy'],control_signal['dz'],control_signal['droll'],control_signal['dpitch'],control_signal['dyaw'],control_signal['gripper']])
obs, _, terminated, truncated, _ = env.step(action)
My model outputs xyz, rpy, with the same format in tfds version. It performs very well on my split of tfds train/val set but is doing really bad in maniskill2 virtual environment. So i think this is because the configuration in my maniskill2 virtual env and tfds dataset are different. Is there any possible ways for me to reproduce the env in tfds dataset?
- How can I generate datasets with same control mode in maniskill tfds version. i.e. can I reproduce tfds dataset generation process?
A lot of thanks!
what is this tfds dataset? If it's one of the ManiSkill2 datasets I can maybe recreate them for ManiSkill 3 for use
Thanks for your reply. I’m referring to Rex version: https://www.tensorflow.org/datasets/catalog/maniskill_dataset_converted_externally_to_rlds
I tried to automatically generate trajectories in maniskill3, and then convert it from joint positions to pd-ee-delta-pose, but the maniskill environment doesn’t support this conversion yet. Also, in this issue, it’s noted that : This means that in the current tfds maniskill dataset, we don't decouple translation and rotation actions when calculating the new tcp pose, while for other parts of the open-x-embodiment dataset they decouple it.
I wonder if there’s any ways to reproduce maniskill-rtx version. Or can I generate of-ee-delta-pose datasets in maniskill by myself, train on it, and test models in maniskill?
Great thanks!
As other people decouple them explicitly, maniskill has also shifted to decoupling them by default. I will get back to you on how to regenerate the datasets there later, do you specifically want the tfds format?
Thanks! Yes, I’m interest to have the same control mode in rtx-style training data and maniskill virtual environment: xyz, rpy and gripper signals are good for me. Thanks again for your help!
Do you know which frame of the xyz rpy signals are needed? We support a few options, I can prioritize trying to get the data for the one you need
Thanks! Sorry but what do you mean by frame? I’m interested in xyz rpy of end-effector, with all setting the same as pd-ee-delta-pose control mode, to do tasks in manidkills like picking up, moving charger, …
So end effector control has multiple ways of controlling XYZ and RPY.
Eg +z only and not x or y can lead to different results if you pick the robot base frame or the end effector frame.
See here for more details and videos of how they work: https://maniskill.readthedocs.io/en/latest/user_guide/concepts/controllers.html#pd-ee-end-effector-pose
I think root frame should be good for me. Great thanks!
The conversion tool is added back @RussRobin via #388 however the success rate is not guaranteed to be very high. For example a task like PegInsertionSide conversion success rate is about 40%. So you would have to generate a ton of demonstrations and then convert them.
I have also uploaded demos for various datasets in the pd joint position format so you can run the tool to convert it as done before. Tasks with highly precise manipulation will be difficult to convert.
I do have 1 other solution which is to use a hybrid action-free imitation learning + reinforcement learning approach to learn from the pd joint position demonstrations + use RL to learn pd delta ee pose controls. I will probably write some code for this approach (this uses our env state reset based online imitation learning solution called RFCL which can do action free learning)
转换工具是通过 #388 添加回来的,但不能保证成功率非常高。例如,像 PegInsertionSide 这样的任务,转化率约为 40%。因此,您必须生成大量演示,然后转换它们。
我还以 pd 关节位置格式上传了各种数据集的演示,以便您可以像以前一样运行该工具进行转换。具有高精度作的任务将难以转换。
我确实有 1 个其他解决方案,即使用混合无动作模仿学习 + 强化学习方法从 pd 关节位置演示中学习 + 使用 RL 学习 pd delta ee 姿势控制。我可能会为这种方法编写一些代码(它使用我们基于 env state reset 的在线模仿学习解决方案,称为 RFCL,它可以进行无动作学习)
您好,请问这是什么意思:使用混合无动作模仿学习 + 强化学习方法从 pd 关节位置演示中学习 + 使用 RL 学习 pd delta ee 姿势控制。是否有一些代码可以让我理解一下,我对此很感兴趣
See my labs paper on this: https://reverseforward-cl.github.io/