stable-baselines3-contrib [Question] Recurrent Maskable PPO ?!? Rudder ?!?

[Question] Recurrent Maskable PPO ?!? Rudder ?!?

Open tty666 opened this issue 2 months ago • 0 comments

❓ Question

Hello, I am making a lot of test for finRL and using LSTM/GRU into an ActorCritic Policy was one of my first idea but I saw now that you have "maskable" environment. And in fact if you work on opening trades/closing trades + or multidiscrete actions (% of invest, possible leverage, % stop loss, ...) it makes sense also to mask actions for opening a trade when a trade is already open as an example ... So I would like to know if for you it would be possible to mix Recurrent PPO and Maskable PPO ? I am not asking for a feature but more on your expertise about the feasibility of mixing those two particular PPO implementation ? Also I saw an article about "Rudder" for delayed reward on RL and maybe we could see it implemented also in stable-baselines3 (based also on LSTM for the delayed reward) ? https://ml-jku.github.io/rudder/ Thanks in advance for your answer guys !

(And yes I know about the risk on Financial algorithm - gmabling aspect but it doesn't mean it's not interesting !)

Checklist

[X] I have checked that there is no similar issue in the repo
[X] I have read the documentation
[X] If code there is, it is minimal and working
[X] If code there is, it is formatted using the markdown code blocks for both code and stack traces.

Apr 30 '24 18:04 tty666

stable-baselines3-contrib stable-baselines3-contrib copied to clipboard

[Question] Recurrent Maskable PPO ?!? Rudder ?!?

❓ Question

Checklist

stable-baselines3-contrib
stable-baselines3-contrib copied to clipboard