drl-portfolio-management icon indicating copy to clipboard operation
drl-portfolio-management copied to clipboard

Question

Open joaosalvado10 opened this issue 6 years ago • 17 comments

Hello, this seems like a nice work. I have some questions though. When you're comparing on the test data what does refer the market value? In your perspective do you think this is a good approach?

joaosalvado10 avatar Dec 15 '17 17:12 joaosalvado10

The market value refers to the return of equal distribution of your current investment volume. I think it's better to incorporate news data into the model as it is super useful to indicate the sharp transition of the market.

vermouth1992 avatar Dec 24 '17 06:12 vermouth1992

How did you include news? And did you include news for each stock separately?

joaosalvado10 avatar Dec 28 '17 15:12 joaosalvado10

We didn't include the news in this course project due to time limit. The general idea is to predict the sentiment label (positive, negative, neutral) of each stock at each timestamp and use it to guide the choice of prediction produced by value. The key assumption is that we assume the stock market follow the statistics of history value until certain turning point happens, which can be reflected by the news.

vermouth1992 avatar Jan 02 '18 04:01 vermouth1992

So in order to include the news, it would be necessary to change the model and training it again right? At this moment the model is only capable of receiving the open high low close of the 16 stocks right? How can I reTrain the model?

Also, I found out that the model always tries to buy only one stock instead of distributing the money, this could be a nice approach, however, a bit risky no?

How would you ensemble the imitation learning and the DDPG? What kind of improves you think that should be made beside ensemble and using the Sharpe ratio instead of realized the profit.

joaosalvado10 avatar Jan 03 '18 17:01 joaosalvado10

Yes, the result of imitation learning is to just buy one stock. We try to optimize Sharpe ratio directly, but it turns out to be a very difficult problem since it's not a standard MDP. To include news, you have to build another separate model. It would take considerable amount of time to implement because collecting and processing the dataset is not very easy at the first place.

vermouth1992 avatar Jan 04 '18 09:01 vermouth1992

Yes, but by the time that I would have let's say the sentiment of the news I think that it would not take that much effort. Also, I found out that this model is only capable of buy (long) it is not capable of sell (short) wouldn't make sense to include it in the model?

joaosalvado10 avatar Jan 04 '18 10:01 joaosalvado10

There is an assumption of the trading rule. Buy using the open price and sell all the holdings at the close price on the same day and repeat.

vermouth1992 avatar Jan 04 '18 16:01 vermouth1992

Yes, I know but besides buying a stock is also possible to short one stock which is the opposite of buying the stock. In order to do this the output layer of the agent would have to be capable of output values between -1 and 1 instead of 0 and 1, also the reward function would need to be changed in order to include this. It is good to include this because if there is any crash in the stock markets it is important that the agent stop buying and start shorting instead.

When I said that the agent always output 1 for one stock I was referring to the DDPG agent, I mean the agent does not output 1 but it outputs a value really close to 1 for one stock while outputs values really close to zero for the other stocks.

I found this have a look: https://github.com/ZhengyaoJiang/PGPortfolio

joaosalvado10 avatar Jan 05 '18 09:01 joaosalvado10

I changed the code so the instead of feeding the network with only the close/open ration, feed the network with open high low close and stock news, however, when I do this the network does not learn anymore. Any reason why do you feed only with close/open ratio and how would you change?

joaosalvado10 avatar Jan 10 '18 14:01 joaosalvado10

You need to normalize the data first.

vermouth1992 avatar Jan 10 '18 17:01 vermouth1992

I did normalize it. How would you allow the network the possibility to short a stock? i think that is also important

joaosalvado10 avatar Jan 10 '18 18:01 joaosalvado10

Also i found out that during the test the output of the network is always almost 1 for one stock and zero for the others. This can not be the best choice, so do you know how to solve this?

joaosalvado10 avatar Jan 12 '18 16:01 joaosalvado10

The best choice is actually investing anything for only 1 stock with highest return if we know the future. The reason that we split our investment is to avoid risk of false prediction. If you are running imitation learning, then it is actually expected. For DDPG, it's very hard to tell. You need to look at the actual trading performance.

vermouth1992 avatar Jan 12 '18 18:01 vermouth1992

In fact that is right, however, when I looked at the output produced by the DDPG it seems like it takes always the same decision. (I am using the pretrained algo with different stocks, trying to see if it capable of generalizing)

joaosalvado10 avatar Jan 15 '18 09:01 joaosalvado10

Maybe you want to take a look at the input scale. The pretrained model assumes the input is close/open ratio and normalize with (x - 1) / 100.

vermouth1992 avatar Jan 15 '18 22:01 vermouth1992

Yeah, I realized that but when predict single is called it normalizes the observation. I realized that when I trained the DDPG model is quite difficult for generalizing, I cannot replicate your results when using my own dataset.

joaosalvado10 avatar Jan 16 '18 11:01 joaosalvado10

Hello, I could not find how the pretrained model is generated. Is something missing?

Thanks.

yanxurui avatar Feb 18 '19 21:02 yanxurui