Zach Mueller
Zach Mueller
Nope, you do not. That is also extremely valid (and why the non-yaml option exists, for situations where we need to wrap/call it separately and a yaml makes it complicated)
Hi all, we finally narrowed down the two sources of leakage in the implementation that we could improve. #2089 will fix this, reducing your memory by a _significant amount_. For...
@maxidl can you share your modified code? Curious what those exceptions are that exist for "no good reason"
Thanks @maxidl, as an approach here's what the team has decided we will do: 1. I'll put a PR in today that let's you *explicitly disable* the blocking behavior, and...
What kind of gpu setup are you using?
@DragonDRLI can you try specifying "gpu_ids" as "all" in your config? Check `vim ~/.cache/huggingface/accelerate/default_config.yaml` and do: ``` gpu_ids: all ``` (Notice no quotes)
@DragonDRLI can you try perhaps upgrading your torch version? (Doubtful, but having some issues recreating this). E.g.: `pip install light-the-torch; ltt install torch torchvision -U`
As Sylvain says, it's your dataset that's the issue. I would recommend ensuring that there are enough samples for at least 1 full batch between all your GPUs (so if...
@efsotr during my tests I'm able to have it all work properly, however you'll need to specify a new port in your config to launch on, which may stem your...
Big model inference is only for _inference_, not training at this time. Oops: I'm wrong!