MCTS.jl icon indicating copy to clipboard operation
MCTS.jl copied to clipboard

Best path seen over entire search

Open rcnlee opened this issue 7 years ago • 1 comments

What's the best way to get the best path seen over the entire search in DPW? This is the sequence of s,a,r's with the best total reward encountered over all samples, which probably occurs during a rollout.

If it doesn't currently exist, how can I implement it?

I see that there is a new action_info architecture where extra info can be returned from action. But you don't get the rollout portion of the sequence because it is hidden in estimate_value which calls RandomSolver. So is the easiest way to write my own rollout function that wraps the existing one? Or is there a better way?

Thanks!

rcnlee avatar Aug 08 '18 01:08 rcnlee

Hmm... yeah, that seems kind of hard right now :/ I definitely didn't plan for it when writing. Do you even know how to get the portion of the trajectory from the tree search? Since simulate is called recursively, It seems like you have to pass more arguments into simulate to keep track of the trajectory.

If you just want the rollout portion, yes, you would just need to implement a new type for estimate_value that keeps track of such things, but I think you will have to write your own version of MCTS.simulate and maybe a few other functions to keep track of the entire trajectory including when it traverses the tree. You could still use the existing tree structures, etc. though.

zsunberg avatar Aug 09 '18 02:08 zsunberg