drl-portfolio-management icon indicating copy to clipboard operation
drl-portfolio-management copied to clipboard

Question regarding Model State

Open sahpat229 opened this issue 6 years ago • 5 comments

I read through your PortfolioEnv code and your DDPG code. It seems to me that the state of your model only incorporates the window of price returns for each asset, and does not incorporate the previous portfolio allocation. This is reflected in how the commission is computed in PortfolioEnv: where you say that dw1 = (y1 * w1) / (np.dot(y1, w1) + eps). Comparing this to the PortfolioEnv implemented by wassname: dw1 = (y1 * w0) / (np.dot(y1, w0) + eps). Can you explain why you calculate the commission in this manner? Much appreciated, thanks.

sahpat229 avatar Mar 14 '18 18:03 sahpat229

The trading model is this: at time stamp t, allocate portfolio denoted as w1 and buy using open price and sell all the portfolio using close price at timestamp t and so on. So at the beginning of each timestamp, we assume you only hold money without any asset. It's actually a process of distribution, gather and redistribution.

vermouth1992 avatar Mar 14 '18 19:03 vermouth1992

I see, so your commission cost in this scenario is the percentage of the profit that is treated as commission, if that is correct?

sahpat229 avatar Mar 14 '18 19:03 sahpat229

Yes.

vermouth1992 avatar Mar 14 '18 19:03 vermouth1992

A couple other questions I have is:

  1. if that is the profit scheme, why is cost calculated as self.cost * (np.abs(dw1 - w1)).sum() instead of saying w2 = np.zeros(len(self.asset_names)+1), w2=[0]=1, cost = self.cost * (np.abs(dw1 - w2)).sum()? It seems to me that the first form, which is what you have, is considering the commission on profit when reallocating back to w1 rather than selling all assets and only holding cash.

  2. In the DDPG training, you say that action = predicted_action + self.actor_noise(), where I replaced the agent's prediction with predicted_action. This can violate the boundaries of 0 to 1 for each item of action, and violate the constraint of action.sum() == 1. In the PortfolioEnv, you correct for this by clipping the action to 0 to 1 and normalizing over its sum, but when adding the action taken to the ReplayBuffer, you add the original action, not the one that was normalized and actually taken by the environment. Is there a specific reason you do this?

  3. In the way the current environment is set up, the action w1 only has an effect on the immediate reward that the agent observes, and has no effect on future states and rewards the agent observes. Is there a reason you chose gamma, in this case, to be 0.99? Please correct me if my understanding of the impact of the action is wrong for the environment.

Thank you for taking the time to answer these questions.

sahpat229 avatar Mar 15 '18 18:03 sahpat229

  1. There may be something wrong with the commission calculation. We are in a hurry and doesn't focus too much on the trading model. But it doesn't affect the algorithm.
  2. I think in the original paper, the action added to replay buffer is just a_t. I have seen classic control RL with DDPG also does this. But I guess you can actually try adding the valid action since it is the same as adding another random noise to the action.
  3. Yes. You are correct. This is actually not the same as traditional robot case. I set gamma to 0.99 just because I wrote this as a generic DDPG algorithm at the very beginning. Not for this specific task. I agree that gamma will not affect the outcome.

vermouth1992 avatar Mar 16 '18 07:03 vermouth1992