update rnn example
This change reintroduces an old example, a recurrent neural network that demonstrates the flexibility of Ray's scheduling. I have updated the code to match the latest Ray API. In addition, there are now several separate scripts:
-
rnn_monolithic.pyis an idiomatic Tensorflow implementations -
rnn_ray.pyis distributed implementation for Ray -
rnn_monolithic_task.pydoes not use Ray, but structures the computation as we do for Ray
I have also reduced the size of the computation so that it runs in an 8GB vm instance.
At this point I am seeking feedback rather than an immediate merge. I would like to structure this example like the others, where there is a driver.py file. I imagine that the driver.py will just run the test under Ray, whereas we can provide other scripts for running the comparisons to other implementations. Additional tasks remaining include adding documentation and adding this example to the continuous integration test suite.
Nice job! One interesting thing here is that it shows how you could parallelize an RNN by hand. This will be a very interesting example to benchmark and to try to understand what scheduling limitations we run up against as well as what sources of overhead we encounter.
For one benchmark I ran a while ago I remember getting between a 2x and 3x speedup with 3 machines, so not great, but I never successfully tracked down the overhead. If I remember correctly, rnn_monolithic.py and rnn_monolithic_task.py should have roughly the same performance.
Structuring this like the other example applications would be great. I'd focus on the distributed example, and possible include the pure TF example only in the README (although it will be important for benchmarking). When we finally merge it, I think it makes sense to slim down the example a lot (e.g., just one RNN, we don't have to do examples for 1-layer, 2-layers, 3-layers, etc).