OfflineRL issues

Why is there no LaGrange adjustment in COMBO?

Since COMBO is derived from CQL, I just wonder why the auto adjustment used in CQL is not discussed in COMBO? Is it useless in COMBO?

dbsxdbsx

[bug fix] make TanhNormal.mode a property

1

- This is for compatibility with newer versions of PyTorch, where `mode` is changed to be a class property of distributions. See https://github.com/pytorch/pytorch/pull/76690 for details.

typoverflow

when i using this to train ,there comes a attributrerror

1

![image](https://user-images.githubusercontent.com/22930365/153391686-f58c0ec3-b35c-4d58-8683-349d9a26774f.png) Usage of Session.__init__ is deprecated! Traceback (most recent call last): File "examples/train_d4rl.py", line 19, in fire.Fire(run_algo) File "/root/anaconda3/envs/off/lib/python3.7/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name)...

houmaolin

When I run the example. I have an RuntimeError: mat1 and mat2 shapes cannot be multiplied (18x1 and 18x256)

1

When I run the command python examples/train_task.py --algo_name=mopo --exp_name=halfcheetah --task HalfCheetah-v3 --task_data_type low --task_train_num 2 It shows : ``` File "examples/train_task.py", line 19, in fire.Fire(run_algo) File "/home/lksgcc/.pyenv/versions/anaconda3-5.0.1/envs/mujoco_py/lib/python3.8/site-packages/fire/core.py", line 141, in...

lk1983823

Question about cql_loss calculation in COMBO

When COMBO is derived from CQL, why do they calculate CQL_loss differently?

return-sleep

环境搭建过程记录

3

# 创建激活环境创建Conda 环境，这里取Python3.7，因为这是TensorFlow 1.X 的最后支持版本，之后的Python只能用TensorFlow 2.0之后的版本了。2.0 大改，很多老代码用不了。 ```bash conda create -n offline python=3.7 ``` conda 重新初始化一下。 ```bash conda init ``` 激活刚刚创建的环境 ```bash conda activate offline ``` # TensorFlow 和...

Dilettante258

[Solved] Something went wrong in `get_repo` & Aim up don't work well

```bash $ python examples/train_task.py --algo_name=cql --exp_name=halfcheetah --task HalfCheetah-v3 --task_data_type low --task_train_num 100 2023-11-03 at 16:35:56.112 | INFO | Use cql algorithm! running build_ext 2023-11-03 at 16:35:56.618 | INFO | obs...

Dilettante258

Problem with d4rl env

1. 建议setup.py的install_requires将'sklearn'换成'scikit-learn'，只安装sklearn的话，在data.py的from sklearn.preprocessing import MinMaxScaler步骤会报错。 2. evaluation/neorl.py中的test_on_real_env函数中有判断：if "sp" or "sales" in env._name，但是d4rl环境似乎没有_name属性，会报错。 3. ray==1.2版本报错： `Traceback (most recent call last): File "/home/luofm/Utils/pyenv/offlinerl/lib/python3.8/site-packages/ray/new_dashboard/agent.py", line 323, in loop.run_until_complete(agent.run()) File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in...

FanmingL

The torch version issue

1

Hi there, After getting everything installed, I run the script provided and meet the error: `File "/home/hsy/PycharmProjects/OfflineRL/offlinerl/utils/net/tanhpolicy.py", line 29, in __init__ self.mode = torch.tanh(normal_mean) AttributeError: can't set attribute.` Pycharm also...

Theohhhu

Cumulative rewards drop sharply

1

When i run mopo , i find that the Cumulative rewards drop sharply. Why? Has this ever happened to you too? Thanks ![1714379756906](https://github.com/polixir/OfflineRL/assets/143678027/bd85e452-7194-441e-892b-0e2344fcff6c) ![image](https://github.com/polixir/OfflineRL/assets/143678027/8fe99323-e33f-47cf-8e3a-e09d47779951)

greenantoflw

OfflineRL
OfflineRL copied to clipboard

Metadata

Why is there no LaGrange adjustment in COMBO?

[bug fix] make TanhNormal.mode a property

when i using this to train ,there comes a attributrerror

When I run the example. I have an RuntimeError: mat1 and mat2 shapes cannot be multiplied (18x1 and 18x256)

Question about cql_loss calculation in COMBO

环境搭建过程记录

[Solved] Something went wrong in `get_repo` & Aim up don't work well

Problem with d4rl env

The torch version issue

Cumulative rewards drop sharply

← Metadata

Owner

Metadata

OfflineRL OfflineRL copied to clipboard

Metadata

← Metadata

Owner

Metadata

OfflineRL
OfflineRL copied to clipboard