irl-maxent icon indicating copy to clipboard operation
irl-maxent copied to clipboard

if the trajectory stays n the terminal state (for a limited number of times)

Open ArezooAalipanah opened this issue 1 year ago • 3 comments

hi thank you sooooooo much for this amazing repo. I have been trying to build mu own environment but I faced some issues. what if we have something like this : going from s0 to s1 to s2 and then staying in s3 for ever (I changed the value iteration so now my trajectories are all 50 steps ) so my svf is something like(1,1,1,47, 0,...,0) However I am facing some difficulties. my zs and za start getting so big and then they become nan. and this ends in my omega to be nan as well I was wondering if you have any idea how I can fix it? and what is the problem. I am reading Dr.Zeibart's thesis but still have no clue how to tackle such problem(since z_terminal is 1 I am thinking maybe that results in the problem) if you have any idea I would be so grateful if you share your thoughts Thanks again

ArezooAalipanah avatar Feb 05 '24 01:02 ArezooAalipanah

here is a bit more info : my trajectory (I made it of len 40 this time) image the first iteration with initialization of 1 next array is my parameters after first iteration however after second iteration they all end up being nan image

ArezooAalipanah avatar Feb 05 '24 01:02 ArezooAalipanah

Hi, I'm sorry for the long silence. I will likely need some time to look into this, and as I was a bit busy with work-related things lately I never got around to it. Things should be less stressful now, so I will try to look into it this weekend.

qzed avatar Apr 16 '24 19:04 qzed

thank you so much. I made some modifications, like normalizing the rewards and weights each time to avoid going to infinity, but I still have to keep number of iterations limited since it will never converge. I am looking forward for your insight as well _

ArezooAalipanah avatar Apr 23 '24 13:04 ArezooAalipanah