omnisafe
omnisafe copied to clipboard
关于sac_lag算法的一些问题
Required prerequisites
- [X] I have read the documentation https://omnisafe.readthedocs.io.
- [X] I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
- [X] Consider asking first in a Discussion.
Questions
Omnisafe团队您好,首先感谢您们的无私奉献。我是一名研一的学生,目前在学习sac_lag算法, 使用您们开发的sac_lag算法包,但是在训练过程中 cost函数一直没有下降,请问这个问题是怎样引起的呢,应该怎样解决呢?
Are you using the version installed from the source code or the version installed via PyPI? Also, on which task is it not effective? This is because sac_lag requires different learning rates and Lagrangian hyperparameters for different tasks. For specific recommendations, please refer to omnisafe/configs/SACLag.yaml.
请问你使用的是从源码安装的版本,还是通过pypi安装的版本?以及在哪个任务上没有效果呢?这是因为sac_lag在不同任务上需要设定不同的学习率与拉格朗日超参数。具体建议参考omnisafe/configs/SACLag.yaml。
周老师您好,我是在源码安装的版本,然后使用的模型是我自定义的一个电力系统调度模型,约束条件是实时发电与负荷的差距在一定范围内,具体参数设置如图,请问是参数有问题吗? 另外还有一个问题是,在agent.learn()之后 是会自动关闭环境吗?我尝试取出训练过程中的一些数据进行绘图,但是取出来是空数组,如果我不想让他自动关闭环境 应该怎么设置呢? 谢谢您的帮助!
您使用的是从源代码安装的版本还是通过 PyPI 安装的版本?另外,它在哪些任务上无效?这是因为sac_lag不同的任务需要不同的学习率和拉格朗日超参数。具体建议请参考 omnisafe/configs/SACLag.yaml。
请问你使用的是从源码安装的版本,还是通过pypi安装的版本?以及在哪个任务上没有效果呢?这是因为sac_lag在不同任务上需要设定不同的学习率与拉格朗日超参数。具体建议参考omnisafe/configs/SACLag.yaml。
I am not familiar with the dynamic characteristics of power system dispatch, but I can provide some empirical experience in tuning hyperparameters. Generally, the initial values of the Lagrangian multiplier lagrange_cfgs:lagrangian_multiplier_init
, the learning rate lagrange_cfgs:lambda_lr
, and the learning rate of the Actor model model_cfgs:actor:lr
have significant impacts on sac_lag, and it is recommended to use smaller values. You can try using examples/benchmarks/run_experiment_grid.py
to implement a parallel search of various hyperparameter combinations for algorithm debugging. You could also show more details of the training results, such as training curves, core code of the training environment, etc., to help us provide more specific suggestions.
我对电力系统调度的动态特性并不了解,但可以提供一些经验性的超参数调试经验。一般而言,拉格朗日乘子的初值lagrange_cfgs:lagrangian_multiplier_init
、学习率lagrange_cfgs:lambda_lr
以及Actor模型的学习率model_cfgs:actor:lr
对于sac_lag的影响较大,并且推荐使用较小的值。您可以尝试使用examples/benchmarks/run_experiment_grid.py
来实现多种超参数组合的并行搜索来调试算法。您也展示更多训练结果的细节,例如训练曲线、训练环境的核心代码等,以便我们提供更具体的建议。
感谢您的回复,在绘制曲线的过程中同样有一个问题,我按照教程中的例子写了这样的代码来进行测试绘图,智能体学习部分测试成功,但是绘图部分出现了如图bug,看起来似乎是因为一部分路径重复,但是我在查看algo_wrapper文件和plotter文件中并没有发现这个bug。
`import omnisafe
env_id = 'SafetyPointGoal0-v0' custom_cfgs = { 'train_cfgs': { 'total_steps': 10000, }, } agent = omnisafe.Agent('SACLag', env_id,custom_cfgs=custom_cfgs) agent.learn() agent.plot(smooth=1) `
I think this should be because the Windows system does not support parsing the corresponding ./runs
path, and you can directly use examples/plot.py
to visualize the results of the previous training.
我想这应该是因为Windows系统并不支持对应./runs
路径的解析,你可以直接使用examples/plot.py
对先前运行的训练结果进行可视化。
Thank you, the previous problem has been solved, but a new problem has arisen. I set the initial value and learning rate of the Lagrange multiplier, but in the log output of the training process, I observed that the Lagrange multiplier The value has always been the initial value I set and has not been updated automatically. As shown in the picture, what may be the problem?
谢谢您,前边问题解决了,又出现了新的问题, 我设置了拉格朗日乘子的初始值与学习率,但是在训练过程的日志输出中,我观察到拉格朗日乘子的值一直是我设置的初始值,并没有自动更新,如图所示,请问有可能是哪里的问题呢?
You can set warmup_epochs
in omnisafe/configs/off-policy/SACLag.yaml
to the epoch you want the Lagrange multiplier to start updating.
可以将omnisafe/configs/off-policy/SACLag.yaml
中的warmup_epochs
设置为您希望拉格朗日乘子开始更新的那个epoch.
Thank you for your answer. Can I continue to train a model that has been trained for a while? If possible, how should I train?
谢谢您的回答,可以继续训练已经训练过一段时间的模型吗?如果可以的话 应该如何训练呢?
This is a useful feature. However, we currently do not support it and will include it in future updates.
这是一个非常好的特性。但我们目前并不支持,会在后续版本更新。
I use the sac_lag algorithm to train the model, but when a better solution appears, for example: the reward corresponding to the action 0.9998 1 1 is obviously better than the action 1 1 1, why does the model keep choosing the action 1 1 1? What parameters should I adjust to avoid this problem?
我采用sac_lag算法对模型进行训练,但是在出现更好的解时,例如:动作0.9998 1 1对应奖励比动作 1 1 1明显要好,为什么模型还会一直选择动作 1 1 1呢? 我应该调整哪些参数来避免这个问题?
Getting stuck in suboptimal action selection is a common problem in reinforcement learning algorithms. I am not very clear about the specific settings of your environment. The problem you encounter may be related to the following reasons:
- Actions are sampled based on a random distribution, and there are random factors influencing it.
- The policy has weak exploration capabilities, getting stuck in local optima.
For situation 1, you can observe whether the optimal action you think of is chosen under the condition of deterministic=True. For situation 2, you can debug by searching for more training parameters. e,g. algo_cfgs:alpha
and model_cfgs:actor:lr
Hello, I want to use different data input in algorithm training and evaluation, for example, give the agent a lot of data during training, let it randomly select one for training, and specify data for evaluation , may I ask where should it be set?
I tried to re-select the data in the reset part and the init part respectively for training, but when I printed the data in the step part, I found that the data in each step was still the same set of data, as shown in the following figure. May I ask what might cause the problem?
我尝试分别重新选择复位部分和初始化部分的数据进行训练,但是当我打印步骤部分的数据时,我发现每个步骤中的数据仍然是同一组数据,如下图所示。请问是什么原因导致了这个问题?
![]()
![]()
During agent.learn, the reset section doesn't seem to be used because the print statement didn't work when I added it to the reset section
There is another problem, the same code, and the same Python environment, can successfully run training on the local computer, but there is a bug on the server side as shown in the figure, how to modify it?
use different data input in algorithm training and evaluation
You can try setting different reset rules to achieve this functionality. For example, set an argument named seed
for reset
, where when seed
is None, select randomly from several possible input situations; and when seed
is a specified integer, select the corresponding state.
您可以尝试设定不同的reset规则来实现这一功能。例如,为reset
设定名为seed
的参数,当seed
为None
时,从若干种可能的输入情形中随机选取;而当seed
为指定整数时,选取对应的那个状态。
I found that the data in each step was still the same set of data
I'm sorry I couldn't directly identify the problem and am not entirely clear on what your main concern is. I have made two assumptions and hope this will be helpful to you: 很抱歉我并没有找到问题的直接原因,并且不太清楚您关心的是重点。我进行了两种猜想并希望这对您有所帮助:
-
The environment returns the same state after each reset This issue often occurs in the reset function. I noticed that the core of the
reset
function isself.data.sample(1).iloc[0]
. You can check ifself.data
contains the diverse information needed, and whether the random sampling can include the scenarios you wish to sample. -
环境每次重置后都返回相同的状态 这一问题往往出现在reset函数中。我关注到
reset
函数的核心是self.data.sample(1).iloc[0]
。可以检查self.data
是否包含了需要的多样信息,以及随机采样是否能包含您希望采样的情形。 -
The agent's state is the same after each step I carefully read the code you provided and found that the three variables you printed do not seem to be reassigned or updated in the
step
function? Perhaps you should check the logic concerning the environmental interaction dynamics. -
智能体的每次step后状态都一样 我仔细阅读了您提供的代码,发现您打印的三个量似乎都未在
step
函数中重新赋值或更新?也许您应该检查有关环境交互动态的逻辑。
During agent.learn, the reset section doesn't seem to be used
OmniSafe uses the AutoReset wrapper to automatically reset the environment. When the environment returns terminated or truncated as True, it calls the environment's reset function. If your reset is not called, it may be because the logic for terminated or truncated is not completed in the step function.
OmniSafe使用AutoReset
wrapper对环境进行自动重置。当环境返回的terminated
或truncated
为True
时,调用环境的reset
函数。如果您的reset
未被调用,可能是step
函数中并未完成terminated
或truncated
的逻辑。
the same code, and the same Python environment, can successfully run training on the local computer, but there is a bug on the server side
Based on the information you provided, I cannot accurately grasp the problem. It seems that the version of the code you mentioned does not align with the latest version of OmniSafe. Is the version installed on your server consistent with the one on your local machine?
仅根据您提供的信息,我不太能精准把握问题所在。看上去这行代码的版本和最新版OmniSafe不太一致。您在服务器上安装的版本是否与您本地一致呢?
在算法训练和评估中使用不同的数据输入
您可以尝试设置不同的重置规则来实现此功能。例如,设置一个名为 的参数,其中 when 为 None,从几种可能的输入情况中随机选择;当为指定的整数时,选择相应的状态。
seed``reset``seed``seed
您可以尝试设定不同的reset规则来实现这一功能。例如,为设定名为的参数,当为时,从若干种可能的输入情形中随机选取;而当为指定整数时,选取对应的那个状态。
reset``seed``seed``None``seed
我发现每个步骤中的数据仍然是同一组数据
很抱歉,我无法直接确定问题所在,也不完全清楚您的主要关注点是什么。我做了两个假设,希望这对您有所帮助: 很抱歉我并没有找到问题的直接原因,并且不太清楚您关心的是重点。我进行了两种猜想并希望这对您有所帮助:
- 每次重置后,环境都会返回相同的状态 此问题经常发生在重置功能中。我注意到该函数的核心是 。您可以检查是否包含所需的各种信息,以及随机抽样是否可以包含要抽样的方案。
reset``self.data.sample(1).iloc[0]``self.data
- 环境每次重置后都返回相同的状态 这一问题往往出现在reset函数中。我关注到函数的核心是。可以检查是否包含了需要的多样信息,以及随机采样是否能包含您希望采样的情形。
reset``self.data.sample(1).iloc[0]``self.data
- 代理的状态是一样的,每一步都一样,我仔细阅读了你提供的代码,发现你打印的三个变量似乎没有在函数中重新分配或更新?也许你应该检查一下关于环境相互作用动力学的逻辑。
step
- 智能体的每次step后状态都一样 我仔细阅读了您提供的代码,发现您打印的三个量似乎都未在函数中重新赋值或更新?也许您应该检查有关环境交互动态的逻辑。
step
在 agent.learn 期间,似乎未使用重置部分
OmniSafe 使用自动重置包装器自动重置环境。当环境返回 terminated 或 trunced 为 True 时,它会调用环境的重置函数。如果未调用重置,可能是因为 step 函数中未完成终止或截断的逻辑。
OmniSafe使用 wrapper对环境进行自动重置。当环境返回的或为时,调用环境的函数。如果您的未被调用,可能是函数中并未完成或的逻辑。
AutoReset``terminated``truncated``True``reset``reset``step``terminated``truncated
同样的代码,同样的Python环境,可以在本地电脑上成功运行训练,但服务器端存在bug
根据您提供的信息,我无法准确掌握问题所在。您提到的代码版本似乎与最新版本的 OmniSafe 不一致。服务器上安装的版本是否与本地计算机上的版本一致?
仅根据您提供的信息,我不太能精准把握问题所在。看上去这行代码的版本和最新版OmniSafe不太一致。您在服务器上安装的版本是否与您本地一致呢?
Thank you for your answer, I can run successfully after reinstalling the latest version of omnisafe library on the server side, thank you very much! Your answer is of great help to my work! But I still had problems debugging the training set and the test set, I checked my terminated Settings, as you say the algorithm should go to the reset section when terminated is true, I added terminated output at the end of step, And add a print test statement in the reset part, the code is shown in the figure, part of the output is shown in the figure, indicating that the terminated is true, but it does not enter the reset part, may I ask where the problem is? Thanks again for your help!
谢谢你的回答,我在服务器端重新安装最新版本的omnisafe库后可以成功运行了,非常感谢您!您的回答对我的工作有很大的帮助!但是我在进行训练集与测试集的调试时仍然遇到了问题,我检查了我的terminated设置,按照您的意思 当terminated为true时,算法应该会进入到reset部分,我在step末尾加上了terminated输出语句,并在reset部分加上打印测试语句,代码如图,部分输出如图,显示terminated为true,但是并没有进入到reset部分,请问可能是哪里的问题呢?再次感谢您的帮助!
When defining the environment class, did you set need_auto_reset_wrapper
to True
?
您可以检查定义环境类时,是否把need_auto_reset_wrapper
设置为True
。
When defining the environment class, did you set
need_auto_reset_wrapper
toTrue
? 您可以检查定义环境类时,是否把need_auto_reset_wrapper
设置为True
。 Thank you very much for your help, my problem solved!十分感谢您的帮助,我的问题解决了!
Since there has been no response for a long time, we will close this issue. Please feel free to reopen it if you encounter any new problems!