Clara Pohland comments

Results 19 comments of


                                            Clara Pohland

[KTO]: Fix nan losses and crashing job

@kashif or @lewtun can you confirm if the current changes work for you as well?

KTO - support loading the adapter twice

@younesbelkada @kashif, is this a desired feature for you as well? If yes, is someone already implementing it or should we come up with a PR?

es_out: support Upstream Servers

Hi, back then we solved this use-case in my team with an internal workaround. I’m recently a bit short on time, hopefully next weekend I can check on how to...

KtoTrainer: BCO improvements

@kashif that would also make sense. But then some shared functions (e.g. `_tokenize`, `_process_tokens`) need to move to a shared place, maybe trainer/utils.py

KTO training produces NaN rewards

@kashif there are no errors or warnings in the stdout/stderr, it just stops at some point after the nan rewards appear, so I cannot provide a stack trace here. However,...

KTO training produces NaN rewards

Important note here: The crash only appears after the training shows nan values. Otherwise it doesn't. I even saw cases where all results converge to nan values ``` {'loss': 0.0,...

KTO training produces NaN rewards

The output below is from a test with very unbalanced data, namely 2k desired completions and 10k undesired ones. I know that a ratio between 4:3 and 1:1 is required...

KTO training produces NaN rewards

> [kashif](https://github.com/kashif) commented [1 hour ago](https://github.com/huggingface/trl/issues/1447#issuecomment-2009259490) @claralp so the main hyperparam that could affect this is the batch size as it needs a good mix of good and bad examples,...

KTO training produces NaN rewards

closed with #1499 and #1514