Question about MCTS tree retention strategy and potential improvement
I've been studying the MCTS implementation in this project and noticed that the current approach either resets the search tree between moves or only retains a limited portion (around 7 steps) of history. While reviewing the code and discussions, I'm wondering if this might limit the algorithm's ability to learn from complete game trajectories. Would it be beneficial to explore an alternative approach for tree retention? I've been experimenting with a simple path-oriented structure in my local implementation that attempts to:
- Store decision paths more efficiently
- Retain more of the search history between moves
- Allow for better statistical analysis of move sequences
In my limited testing with Gomoku, I've seen some promising initial results, but I'm not sure if this approach would be valuable in the broader context of this project. My implementation is still very early-stage and needs significant refinement. Some early observations:
- The memory overhead seems manageable
- Basic path retrieval performs reasonably well
- Statistical tracking along paths appears useful for move selection
I'm curious if others have encountered similar thoughts about tree retention, or if there are good reasons for the current approach that I'm missing. Has the team already explored alternatives to the current tree reset/limited retention strategy? I'd appreciate any insights or guidance on whether this direction is worth pursuing further, or if there are better approaches I should consider instead.