Brad Hilton

Results 35 comments of Brad Hilton

K, I was able to reproduce on a T4. Thank you @Aranxtonel.

Appears to be a OOM error, but it's failing silently.

> RuntimeError: Sleep mode can only be used for one instance per process. In my experience this is usually raised due to insufficient GPU memory

@linpan what's the error you're seeing? also looks like you may need to run `scripts/run_checks.sh --fix`

I don't see "#中国" “中国” in `pyproject.toml`

@linpan can you run `scripts/run_checks.sh --fix`? once the format issue is fixed then i can merge

In my experience doing more gradient updates works better. Here's [some recent work](https://arxiv.org/abs/2507.07101) that finds the same thing.

@zfflxx does that help answer your question?

It should be fairly straightforward to implement, it just probably takes up more space than the LoRA adapters. I'm uncertain if saving the optimizer state should be the default behavior...