Costa Huang issues

Results 96 issues of


                                            Costa Huang

Summarization TL;DR

Still WIP

get benchmarkr results with TRL's pipeline

This PR attempt to get some benchmark results with TRL's sentiment pipe instead of training a reward model.

`rate` in case of a draw (1vs1)

Hi this is an awesome package and really helpful. Quick question: in the case of draw (1vs1), how should we use the `rate` function? In the [trueskill](https://trueskill.org/), there is a...

aws_eks_cluster_auth is temporary

Hi @adrnswanberg and @vanpelt, My [PR](https://github.com/wandb/local/pull/18) introduced the `aws_eks_cluster_auth` to replace the `aws-iam-authenticator`. However, today I just realized this `aws_eks_cluster_auth` is temporary. See the documentation [here](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/eks_cluster_auth). In particular: `aws_eks_cluster_auth` "generate...

`nan` in MultivariateNormalDiag log prob

Hello thanks for this awesome repo! We have had a slight issue with using distrax which creates `nan` at https://github.com/vwxyzjn/cleanrl/pull/300. See the following reproduction script: ```python from typing import Sequence...

[autodoc] Parse `PEP 257` style docstring

Copied from https://github.com/rr-/docstring_parser/issues/71#issue-1318744037 > The approved [PEP 257](https://peps.python.org/pep-0257/#what-is-a-docstring) mentions the so-called "attribute docstrings", which are string literals in the line after where an attribute is defined. These kind of docstrings...

Asynchronous API for `ParallelRLEnv`

Hello, this work looks pretty cool and looking forward to using it in the future. I was wondering if you would be interested in implementing [EnvPool's Asynchronous API](https://github.com/sail-sg/envpool#asynchronous-api), which looks...

Matching the response tokens

Hello, thanks for the nice reference code! I noticed the following code tries to match the response tokens, but it might match the instruction tokens instead https://github.com/databrickslabs/dolly/blob/aaa0ecb5a5555f99e57e6582f1fb3d289f31940f/training/trainer.py#L60-L63 This is because...

How to run Atari experiments?

Is there anyway to run the Atari experiments? I was trying to tweak around, but it seems https://github.com/IouJenLiu/HTS-RL/blob/7972340c765ef45d2bda353a197b78e0b844f2bd/env_step.py#L7 is built pretty around `gfootball.env`. Thanks

`openrlbenchmark` integration

Hi all, this is very cool stuff. I especially like that there is an MBPO implementation. Would you be interested in using [wandb](https://wandb.ai/) to contribute experiment runs to [`openrlbenchmark`](https://github.com/openrlbenchmark/openrlbenchmark) utilities?...