typewriter icon indicating copy to clipboard operation
typewriter copied to clipboard

Training a Filter

Open ryanpeach opened this issue 5 years ago • 5 comments

Hi,

I'm trying to implement a version of RND for implicit rewards. I'd like it to be both environment agnostic and agent agnostic. The idea was to implement it as either an output filter for the environment, or as an input filter for the agent. Thing is, it needs to be trained, and have it's own (albeit simple) memory. I would like to know what it would be best to implement this as. I already have the filter, do I need:

1.) To enable training the filter by creating a new type of Graph 2.) Wrap the agent such that the filter trains at the same time as the Agent and observes everything the agent observes. (My personal preference) 3.) Make the filter a subclass of both Filter and Agent and have it run as a multi-agent which somehow doesn't act in the world and receives all the observations of the other agent.

Or, better yet, is there a feature or paradigm I'm missing.

Thanks!

ryanpeach avatar Nov 29 '18 23:11 ryanpeach