tianshou icon indicating copy to clipboard operation
tianshou copied to clipboard

Remove inheritance in favour of code duplication

Open drozzy opened this issue 2 years ago • 6 comments

This might be a weird issue, but I would like to make a case in favour of getting rid of inheritance hierarchies among policy classes in tianshou.

For example, let us take PPO policy class:

  1. In order to understand its behaviour I can't just browse the single file ppo.py..
  2. I must also look at a A2C policy from which it inherits - a2c.py file.
  3. This, in turn, requires me to look at PG class in pg.py
  4. Finally, we get to the BasePolicy which I must also inspect in base.py

As such, to understand the functionality of a single algorithm, we must have 4 files open and understand the interplay between all of them.

I propose flattening this hierarchy, such that all policies depend on BasePolicy only:

  1. PPO(BasePolicy)
  2. A2C(BasePolicy)
  3. PG(BasePolicy)

Downsides:

  1. Code duplication. But that is not a big issue as we can extract common functionality into functions or specialized classes.
  2. Loss of common abstractions: but I don't think anyone looking at the code is really looking to find common abstractions among different implementations, as much as they just want the thing to work for them.

Upsides:

  1. New users will have really easy time understanding an algorithm.
  2. We will have easier time fixing bugs in one particular implementation.
  3. The barrier to entry for contributing will be lower - since someone who wants to fix PPO will not need to understand PG implementation (silly example, I know)
  4. Decoupling - easier to fix one thing without breaking the other.

This is just my 2c - let me know what you think.

drozzy avatar Oct 21 '21 01:10 drozzy

Not only PG-A2C-PPO, but also:

  • A2C-NPG-TRPO
  • DDPG-TD3/SAC-DiscreteSAC
  • DQN-C51/QRDQN-FQF/IQN
  • DQN-DiscreteBCQ
  • PG-DiscreteCRR
  • QRDQN-DiscreteCQL

Are you really sure you want to rewrite all of these?

Another point I think is this style can help people easier to understand what's the difference between an original version (like A2C) and a improved version (like PPO).

Trinkle23897 avatar Oct 21 '21 01:10 Trinkle23897

I'm not sure (but I'm not against doing the work), just bringing it up for discussion to see what your thoughts are. But I know personally for me a line like this in PPO:

super().__init__(actor, critic, optim, dist_fn, **kwargs)

https://github.com/thu-ml/tianshou/blob/master/tianshou/policy/modelfree/ppo.py#L76 sends me hunting across files.

drozzy avatar Oct 21 '21 03:10 drozzy

i agree that deep inheritance hierarchy is hard to understand for beginner.

gezp avatar Oct 21 '21 09:10 gezp

in order to solve code duplication, how about separating some code by creating function which is independent of class? if so, it could be easily used in other Policy class without too much duplication.

This is a good design pattern. Deepmind has done in a similar way with https://github.com/deepmind/rlax

Trinkle23897 avatar Oct 21 '21 11:10 Trinkle23897

Do you know, I really rather like that it is done this way; in the past it actually helped me spot a few connections between methods that I had not been aware of before. :)

michalgregor avatar Oct 21 '21 17:10 michalgregor

Another point I think is this style can help people easier to understand what's the difference between an original version (like A2C) and a improved version (like PPO).

I concur. Many policies are invented on top of existing ones, which makes a natural case for inheritance. As long as the hierarchy is not too deep, I feel it helps remove a lot of boilerplate code.

nuance1979 avatar Oct 21 '21 17:10 nuance1979

Another point I think is this style can help people easier to understand what's the difference between an original version (like A2C) and a improved version (like PPO).

I agree to this and there could be still room for improvement of "naming appropriate abstraction"

For example,

  • Policy Gradient, Actor Critic, Actor Learner, Q-Learning seem to be appropriate for broader general classes of algorithms.

  • REINFORCE, A2C, A3C, PPO, DDPG are more specific names i.e. paper's title.

It seems to be defined PG as REINFORCE.

Of course this classification is not perfect: Natural Policy Gradient is in a paper's title but it becomes a sub-classes of collections of specific methods and theory later.

Another example might be "ReplayBuffer". Many rl-libs use for both on-policy and off-policy but the original terminology in deepmind's dqn's paper means training data for offline updates.

atsushi3110 avatar Mar 10 '23 09:03 atsushi3110

Closing this, as it is stale. Anyway it's not an issue but something for discussion. For better or worse, inheritance of policies is a central design component of tianshou

MischaPanch avatar Sep 24 '23 19:09 MischaPanch