Costa Huang

Results 96 issues of Costa Huang

This PR attempt to get some benchmark results with TRL's sentiment pipe instead of training a reward model.

Hi this is an awesome package and really helpful. Quick question: in the case of draw (1vs1), how should we use the `rate` function? In the [trueskill](https://trueskill.org/), there is a...

Hi @adrnswanberg and @vanpelt, My [PR](https://github.com/wandb/local/pull/18) introduced the `aws_eks_cluster_auth` to replace the `aws-iam-authenticator`. However, today I just realized this `aws_eks_cluster_auth` is temporary. See the documentation [here](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/eks_cluster_auth). In particular: `aws_eks_cluster_auth` "generate...

Hello thanks for this awesome repo! We have had a slight issue with using distrax which creates `nan` at https://github.com/vwxyzjn/cleanrl/pull/300. See the following reproduction script: ```python from typing import Sequence...

Copied from https://github.com/rr-/docstring_parser/issues/71#issue-1318744037 > The approved [PEP 257](https://peps.python.org/pep-0257/#what-is-a-docstring) mentions the so-called "attribute docstrings", which are string literals in the line after where an attribute is defined. These kind of docstrings...

Hello, this work looks pretty cool and looking forward to using it in the future. I was wondering if you would be interested in implementing [EnvPool's Asynchronous API](https://github.com/sail-sg/envpool#asynchronous-api), which looks...

Hello, thanks for the nice reference code! I noticed the following code tries to match the response tokens, but it might match the instruction tokens instead https://github.com/databrickslabs/dolly/blob/aaa0ecb5a5555f99e57e6582f1fb3d289f31940f/training/trainer.py#L60-L63 This is because...

Is there anyway to run the Atari experiments? I was trying to tweak around, but it seems https://github.com/IouJenLiu/HTS-RL/blob/7972340c765ef45d2bda353a197b78e0b844f2bd/env_step.py#L7 is built pretty around `gfootball.env`. Thanks

Hi all, this is very cool stuff. I especially like that there is an MBPO implementation. Would you be interested in using [wandb](https://wandb.ai/) to contribute experiment runs to [`openrlbenchmark`](https://github.com/openrlbenchmark/openrlbenchmark) utilities?...