feudal_networks icon indicating copy to clipboard operation
feudal_networks copied to clipboard

Shouldn't manager_vf be function of x_t?

Open imbalu007 opened this issue 8 years ago • 8 comments

Right after eq.(7) in the paper, the authors say V_t as a function of x_t. However, in the code it is a function of g_hat (feudal_policy.py->_build_manager()), self.manager_vf = self._build_value(g_hat) Shouldn't it be a function of x_t?

imbalu007 avatar Sep 11 '17 06:09 imbalu007

That's correct, we caught that late in this development cycle. We still couldn't get anything to converge with that value function, but you're correct that it should be built from the visual input.

On Mon, Sep 11, 2017 at 2:17 AM, imbalu007 [email protected] wrote:

Right after eq.(7) in the paper https://arxiv.org/pdf/1703.01161.pdf, the authors say V_t as a function of x_t. However, in the code it is a function of g_hat (feudal_policy.py->_build_manager()), self.manager_vf = self._build_value(g_hat) Shouldn't it be a function of x_t?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dmakian/feudal_networks/issues/2, or mute the thread https://github.com/notifications/unsubscribe-auth/ALHLEtC3itFyh26hIcrDFt4nhyDlaemDks5shNBsgaJpZM4PSrxj .

davidhershey avatar Sep 11 '17 10:09 davidhershey

Hi, I didn't understand your answer - Are you still trying to implement it or did you abandon this repository?

Thanks

shanlior avatar Sep 13 '17 13:09 shanlior

I'll make a formal post about it as I haven't looked at this in a bit (been busy elsewhere), but as of now I've talked with some researchers and we have reason to believe that FeUdal networks will not converge as described in the original paper.

In the coming weeks I'll try to consolidate the code-base and clean it up as well as possible in case (1) new details are published on how to actually train these networks or (2) someone else can figure out a magic bullet.

On Wed, Sep 13, 2017 at 9:25 AM, Lior Shani [email protected] wrote:

Hi, I didn't understand your answer - Are you still trying to implement it or did you abandon this repository?

Thanks

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dmakian/feudal_networks/issues/2#issuecomment-329166772, or mute the thread https://github.com/notifications/unsubscribe-auth/ALHLEk_RL2ZO9KS034nHLP3_R2q-llVwks5sh9evgaJpZM4PSrxj .

davidhershey avatar Sep 13 '17 13:09 davidhershey

Thanks for the quick and detailed response.

If I can offer any help, I haven't seen an implementation of the "dilated lstm" in your code (or maybe I missed it?). I think it's a core idea in this paper for two reasons:

  1. Had it worked without it, I don't believe it was in the paper.
  2. Without it, there is no mechanism that controls the time interval c, so the goals objective is ill-defined. I believe it suppose to act as some kind of a finite-state machine.

shanlior avatar Sep 13 '17 18:09 shanlior

I do have a dilated LSTM implemented, its in a branch (see dlstm_fix). Hence the need for consolidating! You can find it in models/models.py

On Wed, Sep 13, 2017 at 2:28 PM, Lior Shani [email protected] wrote:

Thanks for the quick and detalied response.

If I can offer any help, I haven't seen an implementation of the "dilated lstm" in your code (or maybe I missed it?). I think it's a core idea in this paper for two reasons:

  1. Had it worked without it, I don't believe it was in the paper.
  2. Without it, there is no mechanism that controls the time interval c, so the goals objective is ill-defined. I believe it suppose to act as some kind of a finite-state machine.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dmakian/feudal_networks/issues/2#issuecomment-329256012, or mute the thread https://github.com/notifications/unsubscribe-auth/ALHLEuxbIhqJcBQv2cTRXgc8gxB6QCuSks5siB7BgaJpZM4PSrxj .

davidhershey avatar Sep 13 '17 20:09 davidhershey

Ohh. I didn't check other branches. Anyhow, that's truly frustrating because this model sounds really tempting (and I guess you put a lot of work into it). Good luck

shanlior avatar Sep 14 '17 07:09 shanlior

@dmakian You mentioned a formal post about convergence problems and the idea feudal networks may not converge as described in the paper. Any update there?

KadeG avatar Feb 26 '18 15:02 KadeG

Sorry this has been off my mind for a while. I doubt a post is coming.

In short: I believe that feudal networks can work, I just think that the implementation is fragile. If DeepMind released their code I'm sure it would function, but as with a lot of Deep RL small differences in code can lead to wildly different performance.

davidhershey avatar Feb 26 '18 15:02 davidhershey