ReinforcementLearning.jl icon indicating copy to clipboard operation
ReinforcementLearning.jl copied to clipboard

Reactivating RLZoo

Open jeremiahpslewis opened this issue 1 year ago • 12 comments

Description

Much of RLZoo needs to be migrated to the latest RLCore.jl / Flux.jl syntax.

The goal is as follows:

  1. Upgrade / refactor code in RLZoo/src/algorithims
    • Such that it fulfills:
      • Uses latest Flux.jl syntax
      • Seamlessly supports GPU where sensible
      • Has unit tests
  2. Add to new library ReinforcementLearningFarm

I think a good approach would be to take this folder by folder, e.g. cfr, dqns, etc. and where possible reactivate the corresponding experiment in RLExperiments. One file / algorithm = one pull request to keep things manageable / reviewable.

It probably makes sense to start with folders /files which are not commented out, before moving onto those which are currently commented out like policy_gradient, where the code is less well maintained and will require more work.

Status

  • [ ] cfr
  • [ ] dqns
  • [ ] exploitability_descent
  • [ ] nfsp
  • [ ] offline_rl
  • [ ] policy_gradient
  • [ ] searching
  • [ ] tabular

jeremiahpslewis avatar Mar 06 '24 20:03 jeremiahpslewis

@joelreymont Would love your help!

jeremiahpslewis avatar Mar 06 '24 20:03 jeremiahpslewis

It will be done! 🫡

joelreymont avatar Mar 07 '24 07:03 joelreymont

By reactivating do you mean uncommenting these in src/algorithms/algorithms.jl?

Can you point me to a description or examples of the latest RLCore.jl / Flux.jl syntax?

joelreymont avatar Mar 07 '24 12:03 joelreymont

The experiments have been deleted. Is it save to restore and comment them out?

joelreymont avatar Mar 07 '24 12:03 joelreymont

Is there an example of seamless GPU support in RLZoo?

joelreymont avatar Mar 07 '24 12:03 joelreymont

Yep, and going through the ones which are currently uncommented and adding tests, updating them.

The policies, learners and explorers shipped in RLCore are a good starting place for the latest syntax: https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/main/src/ReinforcementLearningCore/src/policies/q_based_policy.jl

As for Flux / gpu, this is a good example: https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/main/src/ReinforcementLearningZoo/src/algorithms/dqns/basic_dqn.jl

In general, you can use the gpu function from Flux.jl to pass objects to the gpu, when available, if no gpu is active, then it just passes the object unchanged. Most of the current implementations may use a gpu, but don't necessarily benefit from it, perhaps we could use https://chairmarks.lilithhafner.com/v1.1.0/ to check performance...

The experiments I'm referring to are ones here: https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/main/src/ReinforcementLearningExperiments/test/runtests.jl

jeremiahpslewis avatar Mar 07 '24 12:03 jeremiahpslewis

Are the experiments auto-generated? I don't see them in the repo.

joelreymont avatar Mar 07 '24 12:03 joelreymont

oh, sorry, it's super confusing and I have absolutely no idea why its setup this way (maybe something to do with dependencies), but the experiments are here: https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/tree/main/src/ReinforcementLearningExperiments/deps/experiments

maybe we should move the experiments to the tests folder or otherwise make them more accessible

jeremiahpslewis avatar Mar 07 '24 13:03 jeremiahpslewis

This is a rather large elephant for me to eat all at once so I'm gonna try small chunks and lots of questions!

Also, I need to learn about elephants, e.g. Flux, RL and reinforcement learning.

joelreymont avatar Mar 07 '24 13:03 joelreymont

oh, sorry, it's super confusing and I have absolutely no idea why its setup this way (maybe something to do with dependencies), but the experiments are here: https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/tree/main/src/ReinforcementLearningExperiments/deps/experiments

Experiments are set up in this way because they used to automatically generate plots in the package's documentation. These files are not actually run, they are used to generate some more source code when the package is built. Additionally, the experiments are called with macros, I think it is meant to mimic the naming of experiments in another python RL package. So yeah, RL.jl is already a fairly complex code source--too complex in my opinion--but RLExperiments are wrapped with two additional layers of complexity.

HenriDeh avatar Mar 07 '24 13:03 HenriDeh

I decided to start with CFR so I'm going through the RL blog posts and docs and will read the CFR paper when done.

In the meantime, I've been comparing the model q_based_policy.jl code with the CFR algorithm implementation. This is way over my head at the moment so I'm gonna ask very basic questions...

Are there more precise examples that would help me learn the difference between the old RL Core and/or Flux syntax and the new one?

joelreymont avatar Mar 11 '24 11:03 joelreymont

This page may help you https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/main/docs/src/How_to_implement_a_new_algorithm.md

HenriDeh avatar Mar 11 '24 12:03 HenriDeh

I think I understand what has to be done. Working on it...

joelreymont avatar Mar 12 '24 13:03 joelreymont

Just a note, algorithms that you work on / fix up will end up in RLFarm, not in RLZoo.

jeremiahpslewis avatar Mar 12 '24 14:03 jeremiahpslewis

https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/tree/main/src/ReinforcementLearningFarm

jeremiahpslewis avatar Mar 12 '24 14:03 jeremiahpslewis

And what's gonna happen with RLZoo? Is it going to be left as is and removed eventually?

joelreymont avatar Mar 12 '24 14:03 joelreymont

It will be ~kept in RL.jl as an archive for a certain period of time (3 months? 6 months?)~ transferred to a separate repository as a cold archive. The same for DistributedRL and RLExperiments ~(where a new replacement package will be spun up, separate from this repo)~

Right now the issue is that if you make a change to RLCore, it doesn't pass tests, doesn't pass definition of done etc. unless RLZoo and RLExperiments have been brought up to speed. Given the state of the code in Zoo / Experiments, this makes it impossible to move RLCore forward.

jeremiahpslewis avatar Mar 12 '24 14:03 jeremiahpslewis