TensorFlow-Unreal-Examples icon indicating copy to clipboard operation
TensorFlow-Unreal-Examples copied to clipboard

Asynchronous Reinforcement Learning

Open RawRamen opened this issue 6 years ago • 1 comments

Hi, first of all thanks alot for this plugin, the association TF + UE4 greatly broaden the horizon of what we can do with the engine, and not only in video games. Lately i was thinking about implementing A3C (https://arxiv.org/abs/1602.01783), thing is, in every implementation they create a scene for each actor-learner (from python) and train each one of them in parallel. using ue4 + your plugin how would you tacle this problem ?

RawRamen avatar Mar 12 '18 19:03 RawRamen

I'm not an expert at ML and I'm assuming you're asking the question of how to handle the environment part. With that in mind:

The approach depends on complexity of environment that youre learning against.

Its possible to encapsulate whole environments in a single blueprint or sublevel which you can spawn liberally and keep functionally independent from other environments.

Perfect for simple environment e.g. physics bodies in small environments or simple or bounded environment games. Especially advantageous for cases where rendering is not the input.

If you depend on a rendered input, you can use multiple render to target/texture components or multiple view scenes without launching another instance. These can have their input resolutions scaled down. May be feasible to render hundreds of tiny viewports.

If scene is highly complex and requires full render, separate engine instances may be the best way forward. Think full AAA game simulation.

To reach full potential, you may need to uncap framerate cap along with using fixed framerate setting. This will give the maximum reliable time dilation for simulations during training. E.g. if your computer can render at 600fps, you can effectively train at 10x for 60fps base with no dropped simulations. Regular time dilation mechanic will likely increase step sizes beyond stable training environments.

On python side each tfcomponent has its own instance of your specified class and by default will work well as independent agent. But that means each input and sess.run scheduling will be called individually, increasing call/conversion/scheduling overheads. May need to use only one component instead to coalesce all communication for all training actors allowing for efficient python/tf only updates of weights from peers; also less roundtrips via json. Easy to implement in bp via struct definition with array of desired input structs.

Json encoding may become a limitation if a lot of data is used as state input. However this hasnt been a limiting factor so far, even with small images (mnist) encoded in json arrays. Full >1080p renders will cause a problem, but your model will be unbearably huge in that instance anyway. C++/python buffer pointer passing optimization is also on the roadmap, which will alleviate this limitation for large input data but eta unknown atm.

Finally tensorflow plugin comes with socket.io client included, which means you can potentially accumulate weight difference across many computers/vms and push the updates via socket.io straight into the same pipeline that other actors use. This would allow you to scale further as far as data bandwidth crosstalk limits allow.

getnamo avatar Mar 15 '18 19:03 getnamo