Any speed up method since only run ` pick_coke_can_variant_agg.sh` takes 40 mins
I don't use all coke_can_options_arr # declare -a coke_can_options_arr=("lr_switch=True" "upright=True" "laid_vertically=True")
only declare -a coke_can_options_arr=("upright=True")
But it still takes 40 mins. Is there any speed up method? Or is there any smaller test case?
I try to only run first block
scene_name=google_pick_coke_can_1_v4 and don't run the others like declare -a scene_arr=("Baked_sc1_staging_objaverse_cabinet1_h870" But i cannot get standing sim variant avg success
how can i get standing sim variant avg success when I don't run all blocks?
It's quite strange. Looks like GPU might be unused during eval (pure cpu eval is very slow). What GPU are you using and what's gpu utilization?
You need to keep all scenes to get standing sim variant avg success.
It's quite strange. Looks like GPU might be unused during eval (pure cpu eval is very slow). What GPU are you using and what's gpu utilization?
You need to keep all scenes to get
standing sim variant avg success.
Hi, it is Compute 0% 16214MiB 17% 0% 3870MiB python simpler_env/main_inference.py Graphic 11% 16214MiB 17% 0% 3931MiB python simpler_env/main_inference.py --policy-model Maybe they are not use it not use to compute. I am using H100.
Do you have the correct tensorflow version?
pip install tensorflow==2.15.0
pip install -r requirements_full_install.txt
pip install tensorflow[and-cuda]==2.15.1 # tensorflow gpu support
Also if you are evaluating Octo, see https://github.com/simpler-env/SimplerEnv?tab=readme-ov-file#octo-inference-setup
If model is not using GPU, a warning will be raised.
Do you have the correct tensorflow version?
pip install tensorflow==2.15.0 pip install -r requirements_full_install.txt pip install tensorflow[and-cuda]==2.15.1 # tensorflow gpu support
Thank you! Did you have a advice for torch version? I am face this problem https://github.com/simpler-env/SimplerEnv/issues/30#issuecomment-2799036369 . I am using torch 2.2.0 but is incompatible with tensorflow[and-cuda]==2.15.1 depency cuda lib.
I don't think any part of the existing code uses torch? If so you can just insteall cpu torch I think. I locally have torch 2.3.1 but I haven't tested it for a long time.
I don't think any part of the existing code uses torch? If so you can just insteall cpu torch I think. I locally have torch 2.3.1 but I haven't tested it for a long time.
Thank you. torch 2.3.1 works fine for my openvla! I am using tensorflow 2.15.1
But the speed is still slow, three mins for 11 run_maniskill2_eval_single_episode
pick coke can
OrderedDict([('n_lift_significant', 0), ('consec_grasp', False), ('grasped', False)])
.... 11 times all false
pick coke can
OrderedDict([('n_lift_significant', 0), ('consec_grasp', False), ('grasped', False)])
, I think it takes long in rendering. Graphic 3-4x time longer than computing. I will try to not produce videos.
All policy take RGB images as input so RGB images are always rendered. If graphics is the problem then I think ffmpeg might be causing the issue. In this case (for basically all ffmpeg-caused slow video saving issues) it means there is a lack of system memory (and if you use a cluster, other jobs might be taking too much memory).
Generally on 4090 a single episode of pick coke can is like 5 seconds for RT-1
Pick coke can uses rasterization so there's no ray tracing so env shouldn't be slow on non-RTX gpus; you can try to bench time for each component to see what's the slowest.
All policy take RGB images as input so RGB images are always rendered. If graphics is the problem then I think ffmpeg might be causing the issue. In this case (for basically all ffmpeg-caused slow video saving issues) it means there is a lack of system memory (and if you use a cluster, other jobs might be taking too much memory).
My memory should be fine: 215GB /756GB. Did you means there is not option to (doesn't produce videos)?
If so then ffmpeg shouldn't be the bottleneck. It might take several seconds to save 11 videos.
Sorry I forget mention an important thing, maybe it is because my every episode is fail , so it reach the max steps, so it is slow?
raw_action, action = model.step(image, task_description)
predicted_actions.append(raw_action)
predicted_terminated = bool(action["terminate_episode"][0] > 0)
if predicted_terminated:
if not is_final_subtask:
# advance the environment to the next subtask
predicted_terminated = False
env.advance_to_next_subtask()
model.step Step time: 0.039
env_start_time = time.time()
# step the environment
obs, reward, done, truncated, info = env.step(
np.concatenate(
[action["world_vector"], action["rot_axangle"], action["gripper"]]
),
)
env_end_time = time.time()
print(f"Env step time: {env_end_time - env_start_time}")
Env step time: 0.048
video_start_time = time.time()
for k, v in additional_env_build_kwargs.items():
env_save_name = env_save_name + f"_{k}_{v}"
....
video_path = os.path.join(logging_dir, video_path)
write_video(video_path, images, fps=5)
video_end_time = time.time()
print(f"Video write time: {video_end_time - video_start_time}")
action_start_time = time.time()
# save action trajectory
action_path = video_path.replace(".mp4", ".png")
action_root = os.path.dirname(action_path) + "/actions/"
os.makedirs(action_root, exist_ok=True)
action_path = action_root + os.path.basename(action_path)
model.visualize_epoch(predicted_actions, images, save_path=action_path)
action_end_time = time.time()
print(f"Action save time: {action_end_time - action_start_time}")
Video write time: 0.2038
Action save time: 0.659
one eposide Time taken: 10.73 on H100
Envs step (max episode 75 steps) 0.039*75 < 3s, so I think the policy forward takes ~7s; you can check
Envs step (max episode 75 steps) 0.039*75 < 3s, so I think the policy forward takes ~7s; you can check
The total while not (predicted_terminated or truncated): loops takes 7.29 - 8.37 s.
model.step() takes 0.039s/step
env.step( takes 0.048s/ step
0.039+0.048=0.087
Total timestep: 80, 0.087*80=6.96
But it doesn't add up to 10s?
First prepare part takes 1.76-1.91s
if additional_env_build_kwargs is None:
additional_env_build_kwargs = {}
# Create environment
.....
# Initialize logging
image = get_image_from_maniskill2_obs_dict(env, obs, camera_name=obs_camera_name) #可以用wrist吗?
images = [image]
predicted_actions = []
predicted_terminated, done, truncated = False, False, False
# Initialize model
model.reset(task_description)
timestep = 0
success = "failure"
Second part, The total while not (predicted_terminated or truncated): loops takes 7.29 - 8.37 s.
third part, save video and action takes around 0.85 s
so around 1.76+7.5+0.85=10.11
Yeah for the same env the 1.76s can be saved by not re-creating the env and just reset the env w/ different robot & object pose ; but the majority of time is still 50% policy forward and 50% env step
Yeah for the same env the 1.76s can be saved by not re-creating the env and just reset the env w/ different robot & object pose ; but the majority of time is still 50% policy forward and 50% env step
Thank you for your time! I will keep thinking
Essentially the way to speed up both model inference & env is via parallelizing envs; it's already done in ManiSkill3 for widowx envs but not yet for google robot envs.
Hi @xuanlinli17 @LukeLIN-web, I have a question that might be stupid, can we return early from an episode once success has been achieved?
For example, if success becomes True at the 12th step, all the remaining steps seem unnecessary.
Would it be okay to just break out of the loop at that point?
Yes, you can modify the evaluation code to return early.
Got it. Thanks!
Hi @xuanlinli17 @LukeLIN-web, 你好 I have a question that might be stupid, can we return early from an episode once success has been achieved?我有一个可能很愚蠢的问题,一旦成功,我们可以提前从发作中回来吗?
For example, if
successbecomesTrueat the 12th step, all the remaining steps seem unnecessary.例如,如果success在第 12 步变为True,则其余所有步骤似乎都是不必要的。 Would it be okay to just break out of the loop at that point?在那个时候跳出循环可以吗?
great idea!
save video and action takes around 0.85 s
And we have to save video otherwise cannot count metrics now .
I eval googlerobot
tasks = [
"pick_coke_can_visual_matching.sh",
"pick_coke_can_variant_agg.sh",
"move_near_variant_agg.sh",
"move_near_visual_matching.sh",
"drawer_visual_matching.sh",
"drawer_variant_agg.sh",
]
It takes me 16 hours in A6000 to eval googlerobot, really sad.
And it takes 9.0G to store the generated MP4.
Is it ok to parallelize it? I am worried that there are some issues when running multiple instances on a machine. (I have experience of other softwares that if you run multiple instances there are bugs)
Bridge envs are parallelized in ManiSkill3. Google Robot envs tbd.