machinelearning
machinelearning copied to clipboard
Reinforcement learning
I've looked into the available documentation and examples, but haven't been able to figure out if it is possible to use the ML.NET in its current state for (non-deep) reinforcement learning. If it is possible, I'd be thankful for any hints on how to implement a simple case. In case reinforcement learning is not possible atm, what exactly is missing and are there any plans on implementing the missing pieces?
Thanks!
Reinforcement learning is not yet available in ML.NET. One critical component that is currently missing is "exploration" (trying out different actions to see how the reward changes) and how this affects data collection and model training. This is something that would be interesting to explore in the future.
What is the scenario you have in mind?
I see. My scenario is basically to use it as a part of building a game AI.
Got it. A few more questions: what is the complexity of the game? How many actions does the AI have (the number of choices available to it at each step)? Does the list of actions change throughout the course of the game or is it static?
Well, what I'm actually trying to do, is to build a framework which would be used to write game rules (gameplay code) and would make it as easy as possible to add AI actors as well. In the end anything but a trivial AI would probably have to be build of several components, some ML and some utilizing more traditional features of the framework. The end goal is to allow building games of any practical complexity, but no 'academic restrictions' apply here. RL can only be a piece in the AI puzzle and for example 'goal hacking' is certainly an option if it makes the AI behave in a desirable way :).
While a game AI can be extremely complex, I'm hoping to find ways to split the problem domains into smaller pieces that could then be handled by individual 'RL pipelines'. Maybe even just learn to estimate the value of taking each action in a given situation individually. I think the Microsoft's Mrs PacMan AI used this kind of an approach.
It's certainly common that the list of possible actions an actor can take changes throughout the course of a game, but if the value of each action (for each target) would be evaluated by it's own NN I guess the changes would be a matter of updating the list of evaluated NNs (and whatever logic coordinates between them).
This is all very much just an idea atm, but I guess even a very simple RL implementation would allow meaningful experimentation. I might look into CNTK and their C# bindings now, but as I have no practical experience from ML I wouldn't mind something a bit easier to approach..
To Implement RL such as DQN, I need to train neural network after an observation. however, i don't know how to split train data to train the neural network by 'TextLoader'. Is there any way to gradually input training data? // well, i found this in:https://github.com/dotnet/machinelearning/blob/f6c6f5b0ef4826844375c85438be21134fe4356d/test/Microsoft.ML.Tests/CollectionDataSourceTests.cs#L29
Any news about some RL support?
CC: @wschin
Would be nice too see this implemented some time.
We don't have RL support in the short-term but it is something we are considering for long-term.
Was the closing of this issue intended? How is this tracked from now on?
Our short term goals are around ONNX, DNN training and AutoML. RL is something we have considered and is on our list but just not something we plan to work on this year. Feel free to send a PR on RL and we will happily review.
I understand this is not a priority right now. I would just like to get updates to for this particular feature.
Is there another issue for tracking progress in this area, or how do I get updates to things "on your list"? Where is "your list", if it's not the GitHub issues?
You will get updates when this issue will be reopened. Please read our roadmap and feel free to ask questions on gitter regarding specific feature updates . Thank you for your interest.
Our short term goals are around ONNX, DNN training and AutoML. RL is something we have considered and is on our list but just not something we plan to work on this year. Feel free to send a PR on RL and we will happily review.
@codemzs I was wondering how this fits with the RL work that Unity is doing. They have their ML-agents, which uses TF for the training part, but for inference they developed Barracuda which is based on DirectML and allows use of ONNX models. Since Unity uses C# this looks like a pretty obvious case for using ML.NET with Unity. What is currently missing in ML.NET to implement some general model like PPO?
@codemzs any update on RL?
@david-uk-hash Thanks for checking back. Unfortunately, we do not plan to support RL in the near future. This is an open source project so we encourage and welcome contributions.
Is RL on your roadmap at all? I have a model developed in VB.NET that I currently use with Python for RL via some unpleasant hijinks; being able to stay within a single process would have some really significant benefits.
Even a fully featured Neural Network API in C#, will make our life's easier as PPO or Neat can be implemented easily.
Reopening this issue to gather feedback.
@luisquintanilla What's the current state of this project? Wouldn't it be a unique selling point of ML.NET to provide an efficient back-to-back training environment for RL without slow Python procedures, etc.? :wink:
Currently, I'm implementing a card game AI with ML.NET and TensorFlow. If I make any useful progress, I'm really open to contributing my results and sharing it with you such that we can bake it into a framework :smile:
Regarding the implementation part, I'd suggest roughly following components:
- some kind of environment collecting SARS tuples / carrying out the simulation
- an experience replay memory (e.g. as ring buffer) to sample from
- a set of exploration strategies (like epsilon-greedy)
- a set of reward computation strategies regarding an action backlog
- an interface for custom TD learning algorithms
- some standard implementations like Q learning with an explicit Q table, Deep Q, scalable A3C, Monte-Carlo tree search, etc.
Let me know what you think about it. I'm convinced that C# / ML.NET can become a high-performance RL environment because you can bypass slow Python procedures, perform SIMD optimizations, etc.
Best wishes, Marco
My interest is in evaluating different variants of backgammon and assigning skill rating to each - with the goal of finding the variant that most rewards skill.
Skill level 0 is random moves. Skill level 1 beats level 0 75% of the time. Skill level 2 beats level 1 75% of the time. ...
We're looking to use it for climate control in small warehouse type building. Having large fans that pull heat out, with ducts bringing outside air in. Need the RL to read inside & outside temp/humidity along with fan's speed & CFM, to maintain temp & humidity inside.
@luisquintanilla
Reinforcement learning required for an MMO game already being developed in pure .Net (no Unity)
Creation of the intellect of characters and different machines. Thought she was already there. But it turned out we had to wait.
@luisquintanilla any thoughts on this?
@luisquintanilla Any ideas reinforcement learning will be, might be, or probably not is on the ML.NET 3.0 roadmap?
@torronen Not for 3.0. We're still learning about this scenario.
@torronen Not for 3.0. We're still learning about this scenario.
This would seem to contradict the following bug: https://github.com/dotnet/machinelearning/issues/5918
Which says that there most certainly is a plan being followed for deep learning support. Does it include reinforcement learning?
@jez9999 Could you clarify how #5918 contradicts the statement in this issue?
There is a plan for deep learning which we're actively working on. We're still learning about reinforcement learning and not planning on implementing anything during the ML.NET 3.0 timeframe (Nov 2023). So to answer your second question - our current plan for deep learning doesn't include reinforcement learning.
OK, thought that integration with TorchSharp/pyTorch implied reinforcement learning because it seems you can do it with pyTorch.