Clara Pohland
Clara Pohland
@kashif or @lewtun can you confirm if the current changes work for you as well?
@younesbelkada @kashif, is this a desired feature for you as well? If yes, is someone already implementing it or should we come up with a PR?
Hi, back then we solved this use-case in my team with an internal workaround. I’m recently a bit short on time, hopefully next weekend I can check on how to...
@kashif that would also make sense. But then some shared functions (e.g. `_tokenize`, `_process_tokens`) need to move to a shared place, maybe trainer/utils.py
@kashif there are no errors or warnings in the stdout/stderr, it just stops at some point after the nan rewards appear, so I cannot provide a stack trace here. However,...
Important note here: The crash only appears after the training shows nan values. Otherwise it doesn't. I even saw cases where all results converge to nan values ``` {'loss': 0.0,...
The output below is from a test with very unbalanced data, namely 2k desired completions and 10k undesired ones. I know that a ratio between 4:3 and 1:1 is required...
> [kashif](https://github.com/kashif) commented [1 hour ago](https://github.com/huggingface/trl/issues/1447#issuecomment-2009259490) @claralp so the main hyperparam that could affect this is the batch size as it needs a good mix of good and bad examples,...
closed with #1499 and #1514