Zach Mueller

Results 368 comments of Zach Mueller

Another thing to watch out for @lewtun pointed out to me is making sure any of the libraries you are using @clam004 aren't trying to initialize cuda later on in...

@Liweixin22 ensure that you haven't called anything to CUDA before running `notebook_launcher`. If there are still issues, please let us know what notebook you are trying to run

Try installing from git via `pip install git+https://github.com/huggingface/accelerate`. I believe this is fixed on main

We'll have a release out this week with the non-braking version, thanks @Sewens @ghadiaravi13!

@csaroff ran just fine for me: ```python import os import torch from accelerate import notebook_launcher from fastai.test_utils import synth_learner import fastai.distributed os.environ["NCCL_P2P_DISABLE"]="1" os.environ["NCCL_IB_DISABLE"]="1" def test_nb_launcher(): learn = synth_learner() with learn.distrib_ctx(in_notebook=True):...

Great work @yuvalkirstain! We likely wouldn't want to use submitit, considering their last commit was 6 months ago and doesn't inspire confidence. Do you know of any other SLURM management...

Thanks to @lvwerra, here's a template script that can be used for doing SLURM: ```bash #!/bin/bash #SBATCH --job-name=XYZ #SBATCH --nodes=4 #SBATCH --ntasks-per-node=1 # crucial - only 1 task per dist...

@surak re; your last point, you could probably just write a collection of config.yamls that store the config for each node in a single folder and pass that in perhaps?...