SO2 Unable to Reproduce Paper Results

I attempted to replicate the results of the paper by running the provided codebase. However, I encountered difficulties in reproducing both the offline results and the results after online fine-tuning.

Could someone provide additional guidance ? Any assistance in replicating the results would be greatly appreciated.

Apr 21 '24 03:04 h-shawn

Thanks for your interests about our paper. You can provide more details about your experiment and we will offer the corresponding help.

Apr 22 '24 11:04 PaParaZz1

We trained SO2 on the halfcheetah-random-v2 dataset offline for 3M steps, achieving an average return of 3000. However, after online fine-tuning for 100K steps, the average return only reached around 6000, which is significantly different from the results reported in the paper.

Additionally, I noticed that there is a part of the code in SO2 for which I couldn't find a corresponding implementation in the DI-engine:

# smooth target policy
main_config.policy.learn.noise = True
main_config.policy.learn.noise_sigma = 0.3
main_config.policy.learn.noise_range = dict(
	min=-0.6,
	max=0.6,
)

Apr 22 '24 12:04 h-shawn

We trained SO2 on the halfcheetah-random-v2 dataset offline for 3M steps, achieving an average return of 3000. However, after online fine-tuning for 100K steps, the average return only reached around 6000, which is significantly different from the results reported in the paper.

Additionally, I noticed that there is a part of the code in SO2 for which I couldn't find a corresponding implementation in the DI-engine:
# smooth target policy
main_config.policy.learn.noise = True
main_config.policy.learn.noise_sigma = 0.3
main_config.policy.learn.noise_range = dict(
	min=-0.6,
	max=0.6,
)

According to your feedback, I guess that some codes related to target policy smooth are missing during the process of merging code, I will fix this problem and rerun a series of experiments. Further information will be updated in this issue.

Apr 24 '24 04:04 PaParaZz1

We trained SO2 on the halfcheetah-random-v2 dataset offline for 3M steps, achieving an average return of 3000. However, after online fine-tuning for 100K steps, the average return only reached around 6000, which is significantly different from the results reported in the paper.

Additionally, I noticed that there is a part of the code in SO2 for which I couldn't find a corresponding implementation in the DI-engine:
# smooth target policy
main_config.policy.learn.noise = True
main_config.policy.learn.noise_sigma = 0.3
main_config.policy.learn.noise_range = dict(
	min=-0.6,
	max=0.6,
)

Could you use a specific version of the DI-engine as follows?

pip install 'DI-engine==0.5.0'
python3 -u so2/d4rl_main.py

May 21 '24 02:05 YinminZhang

I used version 0.5.0 of DI-engine, but I still get an error in the create_policy function. TypeError: __init__() got an unexpected keyword argument 'ensemble_num'

Jun 01 '24 15:06 h-shawn

I used version 0.5.0 of DI-engine, but I still get an error in the create_policy function. TypeError: __init__() got an unexpected keyword argument 'ensemble_num'

This bug is due to using the incorrect model in DI-engine/ding/policy/edac.py within the def default_model(self) -> Tuple[str, List[str]]: method. We addressed this issue in version 0.5.1.

To resolve this, could you try upgrading to the newer version by running the following command?

pip install 'DI-engine==0.5.1'

Additionally, there might be conflicts between D4RL and Gym, so it's advisable to install a specific version of Gym to avoid issues. Please install Gym version 0.24.0 by using this command:

pip install 'gym==0.24.0'

Jun 04 '24 03:06 YinminZhang

I used version 0.5.1 of DI-engine and version 0.24.0 of gym, but I still can only achieve the performance I previously described and cannot replicate the results from the paper.

Jun 07 '24 12:06 h-shawn

I used version 0.5.1 of DI-engine and version 0.24.0 of gym, but I still can only achieve the performance I previously described and cannot replicate the results from the paper.

sry, this code doesn't employ pvu as described in the paper.

We will fix this bug in the dev branch, pls wait a little more.🥺

Jun 15 '24 13:06 YinminZhang

I used version 0.5.1 of DI-engine and version 0.24.0 of gym, but I still can only achieve the performance I previously described and cannot replicate the results from the paper.

If necessary, I can provide the source code and corresponding checkpoint files for replication purposes.

Could you please share your email address?

Jun 17 '24 05:06 YinminZhang

Here is the source code and checkpoints. Google Drive

You can reproduce the results as follows:

# 1. update the main_config.policy.learn.learner.load_path to the path of checkpoints
# 2. follow the instructions
cd SO2
python exp/train.py

Jun 17 '24 06:06 YinminZhang

My email is [email protected]. Thanks!

Jun 17 '24 08:06 h-shawn