irl-maxent
irl-maxent copied to clipboard
if the trajectory stays n the terminal state (for a limited number of times)
hi thank you sooooooo much for this amazing repo. I have been trying to build mu own environment but I faced some issues. what if we have something like this : going from s0 to s1 to s2 and then staying in s3 for ever (I changed the value iteration so now my trajectories are all 50 steps ) so my svf is something like(1,1,1,47, 0,...,0) However I am facing some difficulties. my zs and za start getting so big and then they become nan. and this ends in my omega to be nan as well I was wondering if you have any idea how I can fix it? and what is the problem. I am reading Dr.Zeibart's thesis but still have no clue how to tackle such problem(since z_terminal is 1 I am thinking maybe that results in the problem) if you have any idea I would be so grateful if you share your thoughts Thanks again
here is a bit more info :
my trajectory (I made it of len 40 this time)
the first iteration with initialization of 1
next array is my parameters after first iteration
however after second iteration they all end up being nan
Hi, I'm sorry for the long silence. I will likely need some time to look into this, and as I was a bit busy with work-related things lately I never got around to it. Things should be less stressful now, so I will try to look into it this weekend.
thank you so much. I made some modifications, like normalizing the rewards and weights each time to avoid going to infinity, but I still have to keep number of iterations limited since it will never converge. I am looking forward for your insight as well _