FinRL icon indicating copy to clipboard operation
FinRL copied to clipboard

ElegantRL Agent Outputs Inconsistent Actions

Open julianzero opened this issue 1 year ago • 7 comments

An agent trained via ElegantRL, given the same input state, outputs different and even seemingly random actions each time of predictions. Shouldn't an agent output deterministic actions after learning? Is this a bug?

Thanx.

julianzero avatar Aug 30 '23 08:08 julianzero

The initial state of the model is initialized randomly. This randomness can lead to inconsistent results across different runs, especially if no random seed is provided in the code.

To get consistent results, one way is to control this randomness by setting a fixed seed value. By doing this, the random processes will always produce the same result, making our results reproducible across different runs. Here's a way to fix this, using the agent.get_model from DRLAgent:

agent = DRLAgent(env=env_train)
model_a2c = agent.get_model(model_name="a2c", model_kwargs=params, seed=1)

By setting seed=1, the randomness behind the model's initial state is controlled, ensuring that you get the same result every time you run the notebook, given that all other parameters and data remain the same.

Of course, setting a seed is just one step. To achieve maximum profit, you might still need to fine-tune other hyperparameters and ensure the model doesn't underfit or overfit the data.

Hope this helps!

mmmarchetti avatar Aug 30 '23 13:08 mmmarchetti

The initial state of the model is initialized randomly. This randomness can lead to inconsistent results across different runs, especially if no random seed is provided in the code.

To get consistent results, one way is to control this randomness by setting a fixed seed value. By doing this, the random processes will always produce the same result, making our results reproducible across different runs. Here's a way to fix this, using the agent.get_model from DRLAgent:

agent = DRLAgent(env=env_train)
model_a2c = agent.get_model(model_name="a2c", model_kwargs=params, seed=1)

By setting seed=1, the randomness behind the model's initial state is controlled, ensuring that you get the same result every time you run the notebook, given that all other parameters and data remain the same.

Of course, setting a seed is just one step. To achieve maximum profit, you might still need to fine-tune other hyperparameters and ensure the model doesn't underfit or overfit the data.

Hope this helps!

Thanx a lot. So is the randomness part of the trained models now? To fix it, do I need to re-train the models?

julianzero avatar Aug 30 '23 16:08 julianzero

Thank you for reaching out! It's important to note that randomness is inherent in all deep learning models and is not a bug that requires fixing. Rather, it's a characteristic of the algorithm. To achieve consistent results, you can retrain the model with a seed. To learn more about this, please visit this.

mmmarchetti avatar Aug 30 '23 21:08 mmmarchetti

Thank you for reaching out! It's important to note that randomness is inherent in all deep learning models and is not a bug that requires fixing. Rather, it's a characteristic of the algorithm. To achieve consistent results, you can retrain the model with a seed. To learn more about this, please visit this.

Thanks a lot for your help! As far as I know, a trained model via FinRL/ElegantRL can only output an action, i.e., trading quantity, given a state, i.e., the close price of a day; what about the next state or the price of the next day? Is there a way to predict the price of the next day or the next state using FinRL?

julianzero avatar Aug 31 '23 09:08 julianzero

When you input data from the previous day or the market's closing moment into the predictive model, it generates an output suggesting the action to take for the following day.

mmmarchetti avatar Aug 31 '23 18:08 mmmarchetti

When you input data from the previous day or the market's closing moment into the predictive model, it generates an output suggesting the action to take for the following day.

TD3_PARAMS = {"batch_size": 100, "buffer_size": 1000000, "learning_rate": 0.001, "n_steps": 1024, "gamma": 0.99, "seed": 1, "net_dimension": 96, "target_step": 1000, "eval_gap": 6, "eval_times": 2 }

model_td3 = agent.get_model("td3", model_kwargs=TD3_PARAMS) cwd = env_kwargs.get('cwd','./trained_models'+ '/MSFT0' + '/TD3/') trained_td3 = agent.train_model(model=model_td3, cwd=cwd, total_timesteps=300)

agent = DRLAgent(env = env, price_array = price_array, tech_array = tech_array, turbulence_array = turbulence_array)

account_value, actions_done = DRLAgent.DRL_prediction( model_name= "td3", cwd= TRAINED_MODEL_DIR + '/MSFT{}/'.format(0), net_dimension= 96, environment = env_instance)

I still got inconsistent actions as outputs... Could you please help check what was wrong?

Many thanx.

julianzero avatar Sep 02 '23 16:09 julianzero

It seems like we will get consistent outputs. To confirm, we need to perform some tests.

mmmarchetti avatar Sep 04 '23 17:09 mmmarchetti