tforce_btc_trader
tforce_btc_trader copied to clipboard
Actions exploration
I'm working outside of hypersearch right now so these are probably not ideal parameters. It seems the model becomes a little more flexible to less than perfect parameters (and the random associated with the model's initial state) with actions exploration defined.
https://reinforce.io/blog/introduction-to-tensorforce/ actions_exploration=dict( type='ornstein_uhlenbeck', sigma=0.1, mu=0.0, theta=0.1 ), these parameters are from the example in the above link and are not optimized
Any benefit to adding parameters for actions exploration to hypersearch?
Testing this modification to hypersearch.py, had to clear the runs database so it's going to be a bit before I can tell if it affected anything.
hypers['agent'] = {
# 'states_preprocessing': None,
# 'actions_exploration': None,
'actions_exploration.type':'ornstein_uhlenbeck',
'actions_exploration.sigma': {
'type': 'bounded',
'vals': [0., 1.],
'guess': .2,
'hydrate': min_threshold(.05, None)
},
'actions_exploration.mu':{
'type': 'bounded',
'vals': [0., 1.],
'guess': .2,
'hydrate': min_threshold(.05, None)
},
'actions_exploration.theta':{
'type': 'bounded',
'vals': [0., 1.],
'guess': .2,
'hydrate': min_threshold(.05, None)
},
# 'reward_preprocessing': None,
# I'm pretty sure we don't want to experiment any less than .99 for non-terminal reward-types (which are 1.0).
# .99^500 ~= .6%, so looses value sooner than makes sense for our trading horizon. A trade now could effect
# something 2-5k steps later. So .999 is more like it (5k steps ~= .6%)
'discount': 1., # {
# 'type': 'bounded',
# 'vals': [.9, .99],
# 'guess': .97
# },
}
First time tweaking the hypers, if there's a better way let me know.
UPDATE 08/14/18: The above code is not compatible with v0.2 as-is. The ranges to be searched are valid but the syntax is not compatible with the hyperopt implementation in v0.2.
This is able to run for v0.2: Would like for it to toggle on/off like the baseline section, working on that.
'actions_exploration': {
'type': 'ornstein_uhlenbeck',
'sigma': hp.quniform('exploration.sigma', 0, 1, 0.05),
'mu': hp.quniform('exploration.mu', 0, 1, 0.05),
'theta':hp.quniform('exploration.theta', 0, 1, 0.05)
},
Updated 08/19/18 to use use quniform
A brief explanation of the parmaters from here: https://www.maplesoft.com/support/help/maple/view.aspx?path=Finance%2FOrnsteinUhlenbeckProcess The parameter theta is the speed of mean-reversion. The parameter mu is the long-running mean. The parameter sigma is the volatility.
Feel free to add in a pull request, or even just commit to master if you feel confident about it
Going to try and get the values to a little more realistic first before submitting a PR for it. Letting the hypersearch run for a bit so it does it's thing.