deep_rl_trader icon indicating copy to clipboard operation
deep_rl_trader copied to clipboard

I think there is a look-ahead bias

Open mg64ve opened this issue 5 years ago • 1 comments

Hi there, nice work. However I think there is a look-ahead bias. Every timestep, you get state and this state includes the current closeprice. Then with step method you calculate profit as:

self.exit_price = self.closingPrice
self.reward += ((self.entry_price - self.exit_price)/self.exit_price + 1)*(1-self.fee)**2 - 1 # calculate reward

In this case you are using the same information that you already used to predict the next action. What do you think about it?

mg64ve avatar Aug 12 '19 06:08 mg64ve

state_n <- updateState() action_n <- network(state_n) reward_n <- compute_reward(action_n, state_n)


state_n_plus_1 <- updateState() action_n_plus_1 <- network(state_n_plus_1) reward_n_plus_1 <- compute_reward(action_n_plus_1, state_n_plus_1)

reward is computed from the current state, thus there is no lookahead bias. Put it simply, (e.g.) one decides to sell the stock based on past&current price and if one does sell, then one would calculate the earning based on current price.

miroblog avatar Dec 31 '19 20:12 miroblog