tlc-baselines
tlc-baselines copied to clipboard
Questions about the definition of some variables
Dear reseracher, sorry to bother you. I have carefully reviewed the code in your github and have some questions. May I discuss them with you?
- Regarding the definition of action values. I discoverd that there are 8 types (from 0 to 7) in your code, which are directly obtained from the simulation environment. I would like to ask the meaning of these 8 types of actions. I have seen many papers before that have 4 types of actions, such as colight ;
- Regarding the travel time . In metric file , travel_time = current_time - self.vehicle_enter_time[vehicle] ,why not use the leave_time-enter_time?
- Regarding the design of reward,in the maddpg algorithm, the design of rewards : reward+=(self. action==self. last_action) * 2. What is the reason for that? If you could reply to me amidst your busy schedule, I would be very grateful!
action space: according the CityFlow/examples/roadnet.json. the intersection_1_1's trafficLight has 8 lightphases. each phase allow specific roads pass. that is my opinion