mlpack
mlpack copied to clipboard
AlphaZero
This pull request introduces the implementation of the *AlphaZero algorithm in mlpack, along with several supporting components and modifications. The following major additions and improvements are included:
Key Contributions:
- RandomCategorical:
- Added a new random generator for sampling from categorical distributions. This is useful for probabilistic policy selection, particularly in reinforcement learning tasks like AlphaZero.
- CrossEntropy:
- Implemented a new cross-entropy loss function designed to compute the loss between two distributions.
- Removed the misleading alias of the binary cross-entropy loss (
BCELoss), which could cause confusion in certain contexts. This change improves clarity and precision in the library's functionality.
- AlphaZero:
- Complete implementation of the AlphaZero algorithm, which includes both training and inference capabilities.
- A comprehensive tutorial is provided to demonstrate how to use AlphaZero in practice. This includes detailed steps for training, policy updates, and using the model for decision-making.
- RandomReplay (Significant Modification):
- Major enhancements to the existing RandomReplay class to allow the storage and retrieval of policy vectors, which is essential for AlphaZero's experience replay mechanism. This modification enables more flexible replay capabilities.
Additional Information:
- Documentation and Tutorials:
- The PR includes a detailed tutorial on how to use AlphaZero within mlpack. This tutorial covers setup and training, making it easier for users to get started with the algorithm.
- Backward Compatibility:
- Where applicable, care has been taken to ensure backward compatibility, particularly with the modifications to RandomReplay and the removal of the misleading alias in
BCELoss.
This pull request aims to integrate a robust and flexible implementation of AlphaZero, accompanied by essential enhancements in loss functions and replay buffers. These changes are critical for enabling reinforcement learning workflows within mlpack and expanding the library's capabilities in this domain.
How to Review:
- The changes can be reviewed incrementally starting with the
RandomCategoricalclass andCrossEntropyfunction. - Special attention should be given to the modifications in
RandomReplay, as it is a core component for the AlphaZero algorithm's experience replay mechanism.
Bonjour Antoine,
Thank you for adding this, I had a quick look at the code, could you verify the following:
- Please be sure that your PR compiles locally on your device with the tests, and be sure that the tests pass.
- Be sure to use mlpack style use clang_format from one of the open PR to fix it,
- Please templatize the use of armadillo, check other methods to get some inspiration
- Ping me and @rcurtin when you finish all of the above.
Merci
Omar, I understand that you are concerned, but I want to clarify that implementing AlphaZero in mlpack has required significant work from my part, and I have been committed to contributing to mlpack. I was surprised and concerned by your message threatening to report my account to GitHub. My intention has never been to spam or corrupt mlpack. While I used AI to proofread the pull request report, the code itself is the result of thorough work, and, I believe, is sound. I am happy to address the feedback to the pull request to ensure it meets the project’s standards. I hope we can move forward constructively, Antoine Dubois
@AntoineDubois Okay understood, sorry for my comment, you should be able to handle the comments that we have left, please let us know if you need any help