Richard Gong comments

Repositories
Issues
Comments

Results 3 comments of


                                            Richard Gong

CUDA OOM during ckpt saving for Llama2-70b

I'm also running into this (albeit with 4 A100 80GB). Wondering if there is a way we can work around it - happy to make a contribution if the direction...

CUDA OOM during ckpt saving for Llama2-70b

I found a workaround which involves allowing CPU offloading during the phase of saving the state dict. I tested that end-to-end 70B training works with checkpointing on [this repo](https://github.com/modal-labs/llama-finetuning). I...

Many concurrent requests block the event loop

The issue is reproducible with `min_size=100, max_size=100`. Increasing the size of the pool is not a feasible workaround here. Crucially, awaiting a new connection from the pool **should not block...