D4RL
D4RL copied to clipboard
[Question] How was "medium" defined for Mujoco datasets
Question
The paper says that Mujoco medium dataset was generated by training SAC and early stopping when "medium" level was reached. I am interested in knowing how this point was defined. Was it a specific relative score, using the max score as reference e.g. 1/3 the max score? Were recordings of agents used to find specific characteristics e.g. learned to walk but very slowly?