act-plus-plus Diffusion Policy parameters

Hi authors,

I used your 50 demo episodes to train ACT and it worked very well, achieving success rate up to 90% on the cube-transferring task. However, after I changed the algorithm to Diffusion Policy, it turned out that diffusion policy had very low success. I tried multiple hyperparameter settings in your commands.txt but they all couldn't work. The results are below:

(The green is ACT, while others are Diffusion Policy with different hyperparam sets)

So I wonder why that happens and could you share the best-working diffusion policy parameters? Thank you very much!

Jan 12 '24 11:01 yxKryptonite

Hi authors, thanks for your work! But I tried multiple sets of diffusion policy parameters but they all couldn't work? Could you please kindly guide me how to train diffusion policy on the cube-transferring task or provide any trained checkpoints? Thank you very much!

Jan 20 '24 12:01 yxKryptonite

@yxKryptonite, the same question. Have you tried the command like this? conda activate mobile export MUJOCO_GL=egl cd /home/tonyzhao/Research/act-plus-plus CUDA_VISIBLE_DEVICES=0 python3 imitate_episodes.py \ --task_name sim_transfer_cube_scripted \ --ckpt_dir /scr/tonyzhao/train_logs/cube_scripted_diffusion_sweep_4_regressionTest \ --policy_class Diffusion --chunk_size 32 \ --batch_size 32 --lr 1e-4 --seed 0 \ --num_steps 200000 --eval_every 6000 --validate_every 6000 --save_every 6000

Apr 02 '24 01:04 Wallong

@yxKryptonite, I also meet this question, Have you solved this problem now?

May 06 '24 13:05 LanrenzzzZ

Hello,

Same question here. I trained Ziploc Slide ( we created our dataset but same task), and ACT worked well for the task. Then we tried Diffusion Policy class. The validation losses of Diffusion Policy were way much better than ACT for the training. But during the inference, Diffusion Policy did not work. Robots could not even start for the task. I will share the error later to ask. It was an error about "self.ema.averaged_model".

`

class DiffusionPolicy(nn.Module): def init(self, args_override): super().init()

    self.camera_names = args_override['camera_names']
    self.observation_horizon = args_override['observation_horizon'] ### TODO TODO TODO DO THIS
    self.action_horizon = args_override['action_horizon'] # apply chunk size
    self.prediction_horizon = args_override['prediction_horizon'] # chunk size
    self.num_inference_timesteps = args_override['num_inference_timesteps']
    self.ema_power = args_override['ema_power']
    self.lr = args_override['lr']
    self.weight_decay = 0

    self.num_kp = 32
    self.feature_dimension = 64
    self.ac_dim = args_override['action_dim'] # 14 + 2
    self.obs_dim = self.feature_dimension * len(self.camera_names) + 14 # camera features and proprio

    backbones = []
    pools = []
    linears = []
    for _ in self.camera_names:
        backbones.append(ResNet18Conv(**{'input_channel': 3, 'pretrained': False, 'input_coord_conv': False}))
        pools.append(SpatialSoftmax(**{'input_shape': [512, 15, 20], 'num_kp': self.num_kp, 'temperature': 1.0, 'learnable_temperature': False, 'noise_std': 0.0}))
        linears.append(torch.nn.Linear(int(np.prod([self.num_kp, 2])), self.feature_dimension))
    backbones = nn.ModuleList(backbones)
    pools = nn.ModuleList(pools)
    linears = nn.ModuleList(linears)
    
    backbones = replace_bn_with_gn(backbones) # TODO


    noise_pred_net = ConditionalUnet1D(
        input_dim=self.ac_dim,
        global_cond_dim=self.obs_dim*self.observation_horizon
    )

    nets = nn.ModuleDict({
        'policy': nn.ModuleDict({
            'backbones': backbones,
            'pools': pools,
            'linears': linears,
            'noise_pred_net': noise_pred_net
        })
    })

    nets = nets.float().cuda()
    ENABLE_EMA = True
    if ENABLE_EMA:
        ema = EMAModel(parameters=nets, power=self.ema_power)#power=self.ema_power
    else:
        ema = None
    self.nets = nets
    self.ema = ema

    # setup noise scheduler
    self.noise_scheduler = DDIMScheduler(
        num_train_timesteps=50,
        beta_schedule='squaredcos_cap_v2',
        clip_sample=True,
        set_alpha_to_one=True,
        steps_offset=0,
        prediction_type='epsilon'
    )

    n_parameters = sum(p.numel() for p in self.parameters())
    print("number of parameters: %.2fM" % (n_parameters/1e6,))


def configure_optimizers(self):
    optimizer = torch.optim.AdamW(self.nets.parameters(), lr=self.lr, weight_decay=self.weight_decay)
    return optimizer


def __call__(self, qpos, image, actions=None, is_pad=None):
    B = qpos.shape[0]
    if actions is not None: # training time
        nets = self.nets
        all_features = []
        for cam_id in range(len(self.camera_names)):
            cam_image = image[:, cam_id]
            cam_features = nets['policy']['backbones'][cam_id](cam_image)
            pool_features = nets['policy']['pools'][cam_id](cam_features)
            pool_features = torch.flatten(pool_features, start_dim=1)
            out_features = nets['policy']['linears'][cam_id](pool_features)
            all_features.append(out_features)

        obs_cond = torch.cat(all_features + [qpos], dim=1)

        # sample noise to add to actions
        noise = torch.randn(actions.shape, device=obs_cond.device)
        
        # sample a diffusion iteration for each data point
        timesteps = torch.randint(
            0, self.noise_scheduler.config.num_train_timesteps, 
            (B,), device=obs_cond.device
        ).long()
        
        # add noise to the clean actions according to the noise magnitude at each diffusion iteration
        # (this is the forward diffusion process)
        noisy_actions = self.noise_scheduler.add_noise(
            actions, noise, timesteps)
        
        # predict the noise residual
        noise_pred = nets['policy']['noise_pred_net'](noisy_actions, timesteps, global_cond=obs_cond)
        
        # L2 loss
        all_l2 = F.mse_loss(noise_pred, noise, reduction='none')
        loss = (all_l2 * ~is_pad.unsqueeze(-1)).mean()

        loss_dict = {}
        loss_dict['l2_loss'] = loss
        loss_dict['loss'] = loss

        if self.training and self.ema is not None:
            self.ema.step(nets)
        return loss_dict
    else: # inference time
        To = self.observation_horizon
        Ta = self.action_horizon
        Tp = self.prediction_horizon
        action_dim = self.ac_dim
        
        nets = self.nets
        if self.ema is not None:
            nets = self.ema.averaged_model
        
        all_features = []
        for cam_id in range(len(self.camera_names)):
            cam_image = image[:, cam_id]
            cam_features = nets['policy']['backbones'][cam_id](cam_image)
            pool_features = nets['policy']['pools'][cam_id](cam_features)
            pool_features = torch.flatten(pool_features, start_dim=1)
            out_features = nets['policy']['linears'][cam_id](pool_features)
            all_features.append(out_features)

        obs_cond = torch.cat(all_features + [qpos], dim=1)

        # initialize action from Guassian noise
        noisy_action = torch.randn(
            (B, Tp, action_dim), device=obs_cond.device)
        naction = noisy_action
        
        # init scheduler
        self.noise_scheduler.set_timesteps(self.num_inference_timesteps)

        for k in self.noise_scheduler.timesteps:
            # predict noise
            noise_pred = nets['policy']['noise_pred_net'](
                sample=naction, 
                timestep=k,
                global_cond=obs_cond
            )

            # inverse diffusion step (remove noise)
            naction = self.noise_scheduler.step(
                model_output=noise_pred,
                timestep=k,
                sample=naction
            ).prev_sample

        return naction

def serialize(self):
    return {
        "nets": self.nets.state_dict(),
        "ema": self.ema.averaged_model.state_dict() if self.ema is not None else None,
    }

def deserialize(self, model_dict):
    status = self.nets.load_state_dict(model_dict["nets"])
    print('Loaded model')
    if model_dict.get("ema", None) is not None:
        print('Loaded EMA')
        status_ema = self.ema.averaged_model.load_state_dict(model_dict["ema"])
        status = [status, status_ema]
    return status

`

Training command: python3 imitate_episodes.py --task_name aloha_slide_exp1 --ckpt_dir C:/Users/aa/Desktop/act-main/ckpt --policy_class DiffusionPolicy --kl_weight 10 --chunk_size 100 --hidden_dim 512 --batch_size 4 --dim_feedforward 3200 --num_epochs 200 --lr 1e-4 --seed 0

Jul 08 '24 00:07 barsm42

@barsm42 same question, Have you solved this problem?

Jul 10 '24 05:07 woltium

@barsm42 same question, Have you solved this problem?

@woltium We are trying to solve. I trained two policies with same parameters. The only difference is "ENABLE_EMA = True" or "ENABLE_EMA = False" line. We will evaluate, and check if it gives EMAModel error during the inference.

The inference worked with "ENABLE_EMA = False" setting. But the results were not good. It seems more detailed research is needed for our side.

When "ENABLE_EMA = True" it gives error on "nets = self.ema.averaged_model" line, and the robots don't move.

Jul 11 '24 00:07 barsm42