Shouldn't manager_vf be function of x_t?
Right after eq.(7) in the paper, the authors say V_t as a function of x_t. However, in the code it is a function of g_hat (feudal_policy.py->_build_manager()),
self.manager_vf = self._build_value(g_hat)
Shouldn't it be a function of x_t?
That's correct, we caught that late in this development cycle. We still couldn't get anything to converge with that value function, but you're correct that it should be built from the visual input.
On Mon, Sep 11, 2017 at 2:17 AM, imbalu007 [email protected] wrote:
Right after eq.(7) in the paper https://arxiv.org/pdf/1703.01161.pdf, the authors say V_t as a function of x_t. However, in the code it is a function of g_hat (feudal_policy.py->_build_manager()), self.manager_vf = self._build_value(g_hat) Shouldn't it be a function of x_t?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dmakian/feudal_networks/issues/2, or mute the thread https://github.com/notifications/unsubscribe-auth/ALHLEtC3itFyh26hIcrDFt4nhyDlaemDks5shNBsgaJpZM4PSrxj .
Hi, I didn't understand your answer - Are you still trying to implement it or did you abandon this repository?
Thanks
I'll make a formal post about it as I haven't looked at this in a bit (been busy elsewhere), but as of now I've talked with some researchers and we have reason to believe that FeUdal networks will not converge as described in the original paper.
In the coming weeks I'll try to consolidate the code-base and clean it up as well as possible in case (1) new details are published on how to actually train these networks or (2) someone else can figure out a magic bullet.
On Wed, Sep 13, 2017 at 9:25 AM, Lior Shani [email protected] wrote:
Hi, I didn't understand your answer - Are you still trying to implement it or did you abandon this repository?
Thanks
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dmakian/feudal_networks/issues/2#issuecomment-329166772, or mute the thread https://github.com/notifications/unsubscribe-auth/ALHLEk_RL2ZO9KS034nHLP3_R2q-llVwks5sh9evgaJpZM4PSrxj .
Thanks for the quick and detailed response.
If I can offer any help, I haven't seen an implementation of the "dilated lstm" in your code (or maybe I missed it?). I think it's a core idea in this paper for two reasons:
- Had it worked without it, I don't believe it was in the paper.
- Without it, there is no mechanism that controls the time interval c, so the goals objective is ill-defined. I believe it suppose to act as some kind of a finite-state machine.
I do have a dilated LSTM implemented, its in a branch (see dlstm_fix). Hence the need for consolidating! You can find it in models/models.py
On Wed, Sep 13, 2017 at 2:28 PM, Lior Shani [email protected] wrote:
Thanks for the quick and detalied response.
If I can offer any help, I haven't seen an implementation of the "dilated lstm" in your code (or maybe I missed it?). I think it's a core idea in this paper for two reasons:
- Had it worked without it, I don't believe it was in the paper.
- Without it, there is no mechanism that controls the time interval c, so the goals objective is ill-defined. I believe it suppose to act as some kind of a finite-state machine.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dmakian/feudal_networks/issues/2#issuecomment-329256012, or mute the thread https://github.com/notifications/unsubscribe-auth/ALHLEuxbIhqJcBQv2cTRXgc8gxB6QCuSks5siB7BgaJpZM4PSrxj .
Ohh. I didn't check other branches. Anyhow, that's truly frustrating because this model sounds really tempting (and I guess you put a lot of work into it). Good luck
@dmakian You mentioned a formal post about convergence problems and the idea feudal networks may not converge as described in the paper. Any update there?
Sorry this has been off my mind for a while. I doubt a post is coming.
In short: I believe that feudal networks can work, I just think that the implementation is fragile. If DeepMind released their code I'm sure it would function, but as with a lot of Deep RL small differences in code can lead to wildly different performance.