Vid-ODE
Vid-ODE copied to clipboard
Why has the ConvGRU only one layer?
From:
conv_odegru.py (line 113):
first_point_mu, first_point_std = self.encoder_z0(input_tensor=e_truth, time_steps=truth_time_steps, mask=mask, tracker=self.tracker)
base_conv_gru.py (line 127):
last_yi, latent_ys = self.run_ode_conv_gru( input_tensor=input_tensor, mask=mask, time_steps=time_steps, run_backwards=self.run_backwards, tracker=tracker)
and line 161
inc = self.z0_diffeq_solver.ode_func(prev_t, prev_input_tensor) * (t_i - prev_t)
as well as line 180:
# only 1 now
yi = self.cell_list[0](input_tensor=xi, h_cur=yi_ode, mask=mask[:, i])
and conv_odegru.py (line 59):
self.encoder_z0 = Encoder_z0_ODE_ConvGRU(input_size=input_size, input_dim=base_dim, hidden_dim=base_dim, kernel_size=(3, 3), num_layers=1, dtype=torch.cuda.FloatTensor if self.device == 'cuda' else torch.FloatTensor, batch_first=True, bias=True, return_all_layers=True, z0_diffeq_solver=z0_diffeq_solver, run_backwards=self.opt.run_backwards).to(self.device)
where num_layers=1
it seems that the ode solver is only applied to a single latent "layer" from the ConvGRU. The ConvGRU is not stacked and only operates on a latent with a spatial resolution of 32x32 since num_layers in Encodeer_z0_ODE_ConvGRU is 1.
Why is there no further layers used? Isn't that the whole point of a ConvGRU to utilize multiple resolution latents? Or am I missing something?