deep_rl_trader icon indicating copy to clipboard operation
deep_rl_trader copied to clipboard

Training Data / Validation Data Overlap?

Open rustyju opened this issue 6 years ago • 2 comments

I noticed in the /data folder, the training data in /train includes all data for validation data in /test. There's no validation split in the model so I assume validation datapoints also have a chance to be trained by the model. Doesn't that lead to overfit and exaggerated model performance?

rustyju avatar Jun 16 '19 02:06 rustyju

Please correct me if I'm wrong, but @rustyju I think nb_max_episode_steps in the .fit() method actually limits the max steps it can take (which was set to 10,000). Though the files themselves have overlaps, but I think during training the data after 10K ticks are never seen by the agent.

xiaoyongzhu avatar Apr 21 '20 04:04 xiaoyongzhu

@xiaoyongzhu I find this logic about nb_max_episode_steps in keras-rl's library file core.py.

if nb_max_episode_steps and episode_step >= nb_max_episode_steps - 1:
    # Force a terminal state.
    done = True

It means that one episode will end when episode_step > nb_max_episode_steps. Train's data is from 0 to 70K, and test's data is from 0 to 16K. So they have common data range, 0~10K.

puke3615 avatar Jul 22 '22 09:07 puke3615