HARL issues

The computational cost of this code

1

Hi, your work is interesting and I hope to replicate and follow your work. But I am worried that my resources are not enough to support this. I hope you...

yiqian-zyq

About distribution entrophy of Box type

1

Thank you for your work first. I'm trying to use your algorithm in my own environment. But I find the distribution entrophy of continues action space always ascends, even in...

7tosmoke

Query about Integration of Transformer Models

1

Hi, I am interested in your work on heterogeneous-agent reinforcement learning and the algorithms you have developed. I noticed that your current implementation focuses on algorithms like HAPPO, HATRPO, etc.,...

antonio-guillenperez

when I run HASAC with env=pettingzoo_mpe in win11，i meet a problem about .torch.distributions.normal.

2

At first, there was no problem, but after 210,000 steps, an error was suddenly reported Looks like the problem is caused by the Normal function，the code is in harl/models/policy_models/squashed_gaussian_policy.py line...

zer0s0

The issue with HASAC

1

Hi, I have a question about the HASAC. When I run HASAC on smac, it always stuck with "finish warmup, start training". I didn't have this question when I used...

Nineyears-task

What is the difference between FP and EP？

if self.state_type == "EP": data = ( share_obs[:, 0], # (n_threads, share_obs_dim) obs, # (n_agents, n_threads, obs_dim) actions, # (n_agents, n_threads, action_dim) available_actions, # None or (n_agents, n_threads, action_number) rewards[:,...

lordemyj

Modifying Continuous Action Output with Softmax in MAPPO/HAPPO

8

Hello, Thank you for sharing your code, it has been incredibly useful! I am currently trying to use your MAPPO or HAPPO to run my tasks where my actions are...

georgewanglz2019

The rewards don't converge.

1

I'm using the hasac algorithm for my own environment, and the reward oscillates, what could be the cause?

Dxy0529

Bug: There isn't reset of the environment when training

1

I noticed that with on policy algorithms, the data collection process is done in the `run` function in `OnPolicyBaseRunner`. However, in my experiments, I noticed that my environment would not...

handleandwheel

HARL
HARL copied to clipboard

Metadata

The computational cost of this code

About distribution entrophy of Box type

Query about Integration of Transformer Models

when I run HASAC with env=pettingzoo_mpe in win11，i meet a problem about .torch.distributions.normal.

The issue with HASAC

What is the difference between FP and EP？

Modifying Continuous Action Output with Softmax in MAPPO/HAPPO

The rewards don't converge.

Bug: There isn't reset of the environment when training

← Metadata

Owner

Metadata

HARL HARL copied to clipboard

Metadata

← Metadata

Owner

Metadata

HARL
HARL copied to clipboard