SimplerEnv icon indicating copy to clipboard operation
SimplerEnv copied to clipboard

Any speed up method since only run ` pick_coke_can_variant_agg.sh` takes 40 mins

Open LukeLIN-web opened this issue 8 months ago • 28 comments

I don't use all coke_can_options_arr # declare -a coke_can_options_arr=("lr_switch=True" "upright=True" "laid_vertically=True") only declare -a coke_can_options_arr=("upright=True") But it still takes 40 mins. Is there any speed up method? Or is there any smaller test case?

I try to only run first block scene_name=google_pick_coke_can_1_v4 and don't run the others like declare -a scene_arr=("Baked_sc1_staging_objaverse_cabinet1_h870" But i cannot get standing sim variant avg success how can i get standing sim variant avg success when I don't run all blocks?

LukeLIN-web avatar Apr 11 '25 21:04 LukeLIN-web

It's quite strange. Looks like GPU might be unused during eval (pure cpu eval is very slow). What GPU are you using and what's gpu utilization?

You need to keep all scenes to get standing sim variant avg success.

xuanlinli17 avatar Apr 11 '25 22:04 xuanlinli17

It's quite strange. Looks like GPU might be unused during eval (pure cpu eval is very slow). What GPU are you using and what's gpu utilization?

You need to keep all scenes to get standing sim variant avg success.

Hi, it is Compute 0% 16214MiB 17% 0% 3870MiB python simpler_env/main_inference.py Graphic 11% 16214MiB 17% 0% 3931MiB python simpler_env/main_inference.py --policy-model Maybe they are not use it not use to compute. I am using H100.

LukeLIN-web avatar Apr 12 '25 19:04 LukeLIN-web

Do you have the correct tensorflow version?

pip install tensorflow==2.15.0
pip install -r requirements_full_install.txt
pip install tensorflow[and-cuda]==2.15.1 # tensorflow gpu support

xuanlinli17 avatar Apr 12 '25 20:04 xuanlinli17

Also if you are evaluating Octo, see https://github.com/simpler-env/SimplerEnv?tab=readme-ov-file#octo-inference-setup

xuanlinli17 avatar Apr 12 '25 20:04 xuanlinli17

If model is not using GPU, a warning will be raised.

xuanlinli17 avatar Apr 12 '25 20:04 xuanlinli17

Do you have the correct tensorflow version?

pip install tensorflow==2.15.0
pip install -r requirements_full_install.txt
pip install tensorflow[and-cuda]==2.15.1 # tensorflow gpu support

Thank you! Did you have a advice for torch version? I am face this problem https://github.com/simpler-env/SimplerEnv/issues/30#issuecomment-2799036369 . I am using torch 2.2.0 but is incompatible with tensorflow[and-cuda]==2.15.1 depency cuda lib.

LukeLIN-web avatar Apr 12 '25 20:04 LukeLIN-web

I don't think any part of the existing code uses torch? If so you can just insteall cpu torch I think. I locally have torch 2.3.1 but I haven't tested it for a long time.

xuanlinli17 avatar Apr 12 '25 21:04 xuanlinli17

I don't think any part of the existing code uses torch? If so you can just insteall cpu torch I think. I locally have torch 2.3.1 but I haven't tested it for a long time.

Thank you. torch 2.3.1 works fine for my openvla! I am using tensorflow 2.15.1 But the speed is still slow, three mins for 11 run_maniskill2_eval_single_episode

pick coke can
OrderedDict([('n_lift_significant', 0), ('consec_grasp', False), ('grasped', False)])
.... 11 times all false
pick coke can
OrderedDict([('n_lift_significant', 0), ('consec_grasp', False), ('grasped', False)])

, I think it takes long in rendering. Graphic 3-4x time longer than computing. I will try to not produce videos.

LukeLIN-web avatar Apr 12 '25 22:04 LukeLIN-web

All policy take RGB images as input so RGB images are always rendered. If graphics is the problem then I think ffmpeg might be causing the issue. In this case (for basically all ffmpeg-caused slow video saving issues) it means there is a lack of system memory (and if you use a cluster, other jobs might be taking too much memory).

xuanlinli17 avatar Apr 12 '25 22:04 xuanlinli17

Generally on 4090 a single episode of pick coke can is like 5 seconds for RT-1

xuanlinli17 avatar Apr 12 '25 22:04 xuanlinli17

Pick coke can uses rasterization so there's no ray tracing so env shouldn't be slow on non-RTX gpus; you can try to bench time for each component to see what's the slowest.

xuanlinli17 avatar Apr 12 '25 22:04 xuanlinli17

All policy take RGB images as input so RGB images are always rendered. If graphics is the problem then I think ffmpeg might be causing the issue. In this case (for basically all ffmpeg-caused slow video saving issues) it means there is a lack of system memory (and if you use a cluster, other jobs might be taking too much memory).

My memory should be fine: 215GB /756GB. Did you means there is not option to (doesn't produce videos)?

LukeLIN-web avatar Apr 12 '25 22:04 LukeLIN-web

If so then ffmpeg shouldn't be the bottleneck. It might take several seconds to save 11 videos.

xuanlinli17 avatar Apr 12 '25 22:04 xuanlinli17

Sorry I forget mention an important thing, maybe it is because my every episode is fail , so it reach the max steps, so it is slow?

        raw_action, action = model.step(image, task_description)
        predicted_actions.append(raw_action)
        predicted_terminated = bool(action["terminate_episode"][0] > 0)
        if predicted_terminated:
            if not is_final_subtask:
                # advance the environment to the next subtask
                predicted_terminated = False
                env.advance_to_next_subtask()

model.step Step time: 0.039

        env_start_time = time.time()
        # step the environment
        obs, reward, done, truncated, info = env.step(
            np.concatenate(
                [action["world_vector"], action["rot_axangle"], action["gripper"]]
            ),
        )
        env_end_time = time.time()
        print(f"Env step time: {env_end_time - env_start_time}")

Env step time: 0.048


    video_start_time = time.time()
    for k, v in additional_env_build_kwargs.items():
        env_save_name = env_save_name + f"_{k}_{v}"
    .... 
    video_path = os.path.join(logging_dir, video_path)
    write_video(video_path, images, fps=5)
    video_end_time = time.time()
    print(f"Video write time: {video_end_time - video_start_time}")

    action_start_time = time.time()
    # save action trajectory
    action_path = video_path.replace(".mp4", ".png")
    action_root = os.path.dirname(action_path) + "/actions/"
    os.makedirs(action_root, exist_ok=True)
    action_path = action_root + os.path.basename(action_path)
    model.visualize_epoch(predicted_actions, images, save_path=action_path)
    action_end_time = time.time()
    print(f"Action save time: {action_end_time - action_start_time}")

Video write time: 0.2038
Action save time: 0.659

one eposide Time taken: 10.73 on H100

LukeLIN-web avatar Apr 12 '25 22:04 LukeLIN-web

Envs step (max episode 75 steps) 0.039*75 < 3s, so I think the policy forward takes ~7s; you can check

xuanlinli17 avatar Apr 12 '25 23:04 xuanlinli17

Envs step (max episode 75 steps) 0.039*75 < 3s, so I think the policy forward takes ~7s; you can check

The total while not (predicted_terminated or truncated): loops takes 7.29 - 8.37 s. model.step() takes 0.039s/step env.step( takes 0.048s/ step

0.039+0.048=0.087

Total timestep: 80, 0.087*80=6.96

LukeLIN-web avatar Apr 12 '25 23:04 LukeLIN-web

But it doesn't add up to 10s?

xuanlinli17 avatar Apr 12 '25 23:04 xuanlinli17

First prepare part takes 1.76-1.91s

    if additional_env_build_kwargs is None:
        additional_env_build_kwargs = {}

    # Create environment
  ..... 
    # Initialize logging
    image = get_image_from_maniskill2_obs_dict(env, obs, camera_name=obs_camera_name) #可以用wrist吗? 
    images = [image]
    predicted_actions = []
    predicted_terminated, done, truncated = False, False, False

    # Initialize model
    model.reset(task_description)

    timestep = 0
    success = "failure"

Second part, The total while not (predicted_terminated or truncated): loops takes 7.29 - 8.37 s.

third part, save video and action takes around 0.85 s

so around 1.76+7.5+0.85=10.11

LukeLIN-web avatar Apr 12 '25 23:04 LukeLIN-web

Yeah for the same env the 1.76s can be saved by not re-creating the env and just reset the env w/ different robot & object pose ; but the majority of time is still 50% policy forward and 50% env step

xuanlinli17 avatar Apr 12 '25 23:04 xuanlinli17

Yeah for the same env the 1.76s can be saved by not re-creating the env and just reset the env w/ different robot & object pose ; but the majority of time is still 50% policy forward and 50% env step

Thank you for your time! I will keep thinking

LukeLIN-web avatar Apr 13 '25 00:04 LukeLIN-web

Essentially the way to speed up both model inference & env is via parallelizing envs; it's already done in ManiSkill3 for widowx envs but not yet for google robot envs.

xuanlinli17 avatar Apr 13 '25 00:04 xuanlinli17

Hi @xuanlinli17 @LukeLIN-web, I have a question that might be stupid, can we return early from an episode once success has been achieved?

For example, if success becomes True at the 12th step, all the remaining steps seem unnecessary. Would it be okay to just break out of the loop at that point?

Image

jasper0314-huang avatar Apr 16 '25 07:04 jasper0314-huang

Yes, you can modify the evaluation code to return early.

xuanlinli17 avatar Apr 16 '25 08:04 xuanlinli17

Got it. Thanks!

jasper0314-huang avatar Apr 16 '25 08:04 jasper0314-huang

Hi @xuanlinli17 @LukeLIN-web,  你好 I have a question that might be stupid, can we return early from an episode once success has been achieved?我有一个可能很愚蠢的问题,一旦成功,我们可以提前从发作中回来吗?

For example, if success becomes True at the 12th step, all the remaining steps seem unnecessary.例如,如果 success 在第 12 步变为 True,则其余所有步骤似乎都是不必要的。 Would it be okay to just break out of the loop at that point?在那个时候跳出循环可以吗?

Image

great idea!

LukeLIN-web avatar Apr 16 '25 18:04 LukeLIN-web

save video and action takes around 0.85 s

And we have to save video otherwise cannot count metrics now .

I eval googlerobot

tasks = [
    "pick_coke_can_visual_matching.sh",
    "pick_coke_can_variant_agg.sh",
    "move_near_variant_agg.sh",
    "move_near_visual_matching.sh",
    "drawer_visual_matching.sh",
    "drawer_variant_agg.sh",
]

It takes me 16 hours in A6000 to eval googlerobot, really sad.

And it takes 9.0G to store the generated MP4.

LukeLIN-web avatar Apr 18 '25 01:04 LukeLIN-web

Is it ok to parallelize it? I am worried that there are some issues when running multiple instances on a machine. (I have experience of other softwares that if you run multiple instances there are bugs)

Boltzmachine avatar Jun 09 '25 17:06 Boltzmachine

Bridge envs are parallelized in ManiSkill3. Google Robot envs tbd.

xuanlinli17 avatar Jun 09 '25 18:06 xuanlinli17