Unable to Reproduce Paper Results
I attempted to replicate the results of the paper by running the provided codebase. However, I encountered difficulties in reproducing both the offline results and the results after online fine-tuning.
Could someone provide additional guidance ? Any assistance in replicating the results would be greatly appreciated.
Thanks for your interests about our paper. You can provide more details about your experiment and we will offer the corresponding help.
We trained SO2 on the halfcheetah-random-v2 dataset offline for 3M steps, achieving an average return of 3000. However, after online fine-tuning for 100K steps, the average return only reached around 6000, which is significantly different from the results reported in the paper.
Additionally, I noticed that there is a part of the code in SO2 for which I couldn't find a corresponding implementation in the DI-engine:
# smooth target policy
main_config.policy.learn.noise = True
main_config.policy.learn.noise_sigma = 0.3
main_config.policy.learn.noise_range = dict(
min=-0.6,
max=0.6,
)
We trained SO2 on the halfcheetah-random-v2 dataset offline for 3M steps, achieving an average return of 3000. However, after online fine-tuning for 100K steps, the average return only reached around 6000, which is significantly different from the results reported in the paper.
Additionally, I noticed that there is a part of the code in SO2 for which I couldn't find a corresponding implementation in the DI-engine:
# smooth target policy main_config.policy.learn.noise = True main_config.policy.learn.noise_sigma = 0.3 main_config.policy.learn.noise_range = dict( min=-0.6, max=0.6, )
According to your feedback, I guess that some codes related to target policy smooth are missing during the process of merging code, I will fix this problem and rerun a series of experiments. Further information will be updated in this issue.
We trained SO2 on the halfcheetah-random-v2 dataset offline for 3M steps, achieving an average return of 3000. However, after online fine-tuning for 100K steps, the average return only reached around 6000, which is significantly different from the results reported in the paper.
Additionally, I noticed that there is a part of the code in SO2 for which I couldn't find a corresponding implementation in the DI-engine:
# smooth target policy main_config.policy.learn.noise = True main_config.policy.learn.noise_sigma = 0.3 main_config.policy.learn.noise_range = dict( min=-0.6, max=0.6, )
Could you use a specific version of the DI-engine as follows?
pip install 'DI-engine==0.5.0'
python3 -u so2/d4rl_main.py
I used version 0.5.0 of DI-engine, but I still get an error in the create_policy function.
TypeError: __init__() got an unexpected keyword argument 'ensemble_num'
I used version 0.5.0 of DI-engine, but I still get an error in the create_policy function.
TypeError: __init__() got an unexpected keyword argument 'ensemble_num'
This bug is due to using the incorrect model in DI-engine/ding/policy/edac.py within the def default_model(self) -> Tuple[str, List[str]]: method. We addressed this issue in version 0.5.1.
To resolve this, could you try upgrading to the newer version by running the following command?
pip install 'DI-engine==0.5.1'
Additionally, there might be conflicts between D4RL and Gym, so it's advisable to install a specific version of Gym to avoid issues. Please install Gym version 0.24.0 by using this command:
pip install 'gym==0.24.0'
I used version 0.5.1 of DI-engine and version 0.24.0 of gym, but I still can only achieve the performance I previously described and cannot replicate the results from the paper.
I used version 0.5.1 of DI-engine and version 0.24.0 of gym, but I still can only achieve the performance I previously described and cannot replicate the results from the paper.
sry, this code doesn't employ pvu as described in the paper.
We will fix this bug in the dev branch, pls wait a little more.🥺
I used version 0.5.1 of DI-engine and version 0.24.0 of gym, but I still can only achieve the performance I previously described and cannot replicate the results from the paper.
If necessary, I can provide the source code and corresponding checkpoint files for replication purposes.
Could you please share your email address?
Here is the source code and checkpoints. Google Drive
You can reproduce the results as follows:
# 1. update the main_config.policy.learn.learner.load_path to the path of checkpoints
# 2. follow the instructions
cd SO2
python exp/train.py
My email is [email protected]. Thanks!