accelerate
accelerate copied to clipboard
Accelerate not working when setting subset of GPUs as visible CUDA devices
System Info
/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/torchvision/image.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
warn(
Copy-and-paste the text below in your GitHub issue
- `Accelerate` version: 0.27.0
- Platform: Linux-5.15.0-94-generic-x86_64-with-glibc2.35
- Python version: 3.11.6
- Numpy version: 1.26.4
- PyTorch version (GPU?): 2.2.0 (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- System RAM: 125.63 GB
- GPU type: NVIDIA GeForce RTX 4090
- `Accelerate` default config:
- compute_environment: LOCAL_MACHINE
- distributed_type: MULTI_GPU
- mixed_precision: no
- use_cpu: False
- debug: False
- num_processes: 2
- machine_rank: 0
- num_machines: 1
- rdzv_backend: static
- same_network: False
- main_training_function: main
- downcast_bf16: False
- tpu_use_cluster: False
- tpu_use_sudo: False
I hav 1 3090 , and 2 4090 GPUs
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [ ] One of the scripts in the examples/ folder of Accelerate or an officially supported
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
) - [X] My own task or dataset (give details below)
Reproduction
I have this loop
def train_ddp_accelerate(CFG, fold_id, train, output_path):
accelerator = Accelerator(split_batches=True,mixed_precision='fp16')
# accelerator = Accelerator(mixed_precision='fp16')
set_seed(CFG.seed)
device = accelerator.device #'cuda'#torch.device(CFG.device)
train_path_label, val_path_label, _, _ = get_path_label(fold_id, train_all)
train_transform, val_transform = get_transforms(CFG)
train_dataset = HMSHBACSpecDataset(**train_path_label, transform=train_transform)
val_dataset = HMSHBACSpecDataset(**val_path_label, transform=val_transform)
train_loader = torch.utils.data.DataLoader(
train_dataset, batch_size=CFG.batch_size,pin_memory=True, num_workers=4, shuffle=True, drop_last=True)
val_loader = torch.utils.data.DataLoader(
val_dataset, batch_size=CFG.batch_size,pin_memory=True, num_workers=4, shuffle=False, drop_last=False)
model = HMSHBACSpecModel(
model_name=CFG.model_name, pretrained=True, num_classes=6, in_channels=1)
# model = torch.nn.parallel.DataParallel(model, device_ids=[0, 1, 2])
optimizer = optim.AdamW(params=model.parameters(), lr=CFG.lr, weight_decay=CFG.weight_decay)
scheduler = lr_scheduler.OneCycleLR(
optimizer=optimizer, epochs=CFG.max_epoch,
pct_start=0.0, steps_per_epoch=len(train_loader),
max_lr=CFG.lr, div_factor=25, final_div_factor=4.0e-01
)
loss_func = KLDivLossWithLogits()
loss_func.to(device)
# loss_func = torch.nn.parallel.DataParallel(loss_func, device_ids=[0, 1, 2])
loss_func_val = KLDivLossWithLogits()
loss_func_val.to(device)
# loss_func_val = torch.nn.parallel.DataParallel(loss_func_val, device_ids=[0, 1, 2])
# Send everything through `accelerator.prepare`
train_loader, val_loader, model, optimizer,scheduler = accelerator.prepare(
train_loader, val_loader, model, optimizer,scheduler
)
best_val_loss = 1.0e+09
best_epoch = 0
train_loss = 0
# Train for a single epoch
for epoch in range(1, CFG.max_epoch + 1):
epoch_start = time()
model.train()
for batch in train_loader:
#batch = to_device(batch, device)
x, t = batch["data"], batch["target"]
optimizer.zero_grad()
with accelerator.autocast():
y = model(x)
loss = loss_func(y, t)
accelerator.backward(loss)
optimizer.step()
if not accelerator.optimizer_step_was_skipped:
scheduler.step()
train_loss += loss.detach()
train_loss /= len(train_loader)
# Evaluate
model.eval()
correct = 0
val_loss=0
with torch.no_grad():
for batch in val_loader:
x, t = batch["data"], batch["target"]
# x = to_device(x, device)
val_loss += loss_func_val(y, t).detach()
val_loss /= len(val_loader)
accelerator.wait_for_everyone()
total_val_loss = accelerator.reduce(val_loss).cpu()
total_train_loss = accelerator.reduce(train_loss).cpu()
if val_loss < best_val_loss:
best_epoch = epoch
best_val_loss = val_loss
# print("save model")
if accelerator.is_main_process:
accelerator.save_model(model, str(output_path) + f'snapshot_epoch_{epoch}')
#reduced_tensor = accelerator.reduce(process_tensor, reduction="sum")
elapsed_time = time() - epoch_start
accelerator.wait_for_everyone()
if accelerator.is_main_process:
print(
f"[epoch {epoch}] train loss: {total_train_loss: .6f}, val loss: {total_val_loss: .6f}, elapsed_time: {elapsed_time: .3f}")
accelerator.wait_for_everyone()
if epoch - best_epoch > CFG.es_patience:
if accelerator.is_main_process:
print("Early Stopping!")
accelerator.wait_for_everyone()
break
train_loss = 0
#print(f'Accuracy: {100. * correct / len(val_loader.dataset)}')
accelerator.end_training()
accelerator.clear()`
When running like this it runs as expected:
import os
os.environ["NCCL_P2P_DISABLE"]="1"
for fold_id in FOLDS[3:]:
output_path = Path(f"fold{fold_id}")
output_path.mkdir(exist_ok=True)
print(f"[fold{fold_id}]")
notebook_launcher(train_ddp_accelerate, args=(CFG, fold_id, train, output_path), num_processes=3,mixed_precision='fp16')
But when running like this
import os
os.environ['CUDA_DEVICE_ORDER']="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "1,2"
os.environ["NCCL_P2P_DISABLE"]="1"
for fold_id in FOLDS[3:]:
output_path = Path(f"fold{fold_id}")
output_path.mkdir(exist_ok=True)
print(f"[fold{fold_id}]")
notebook_launcher(train_ddp_accelerate, args=(CFG, fold_id, train, output_path), num_processes=2,mixed_precision='fp16')
I get
> ---------------------------------------------------------------------------
ProcessRaisedException Traceback (most recent call last)
File [~/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/accelerate/launchers.py:200](https://file+.vscode-resource.vscode-cdn.net/home/felipe/ssdpny0/hms-harmful-brain-activity-classification/code/~/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/accelerate/launchers.py:200), in notebook_launcher(function, args, num_processes, mixed_precision, use_port, master_addr, node_rank, num_nodes)
[199](https://file+.vscode-resource.vscode-cdn.net/home/felipe/ssdpny0/hms-harmful-brain-activity-classification/code/~/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/accelerate/launchers.py:199) try:
--> [200](https://file+.vscode-resource.vscode-cdn.net/home/felipe/ssdpny0/hms-harmful-brain-activity-classification/code/~/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/accelerate/launchers.py:200) start_processes(launcher, args=args, nprocs=num_processes, start_method="fork")
[201](https://file+.vscode-resource.vscode-cdn.net/home/felipe/ssdpny0/hms-harmful-brain-activity-classification/code/~/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/accelerate/launchers.py:201) except ProcessRaisedException as e:
File [~/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/torch/multiprocessing/spawn.py:197](https://file+.vscode-resource.vscode-cdn.net/home/felipe/ssdpny0/hms-harmful-brain-activity-classification/code/~/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/torch/multiprocessing/spawn.py:197), in start_processes(fn, args, nprocs, join, daemon, start_method)
[196](https://file+.vscode-resource.vscode-cdn.net/home/felipe/ssdpny0/hms-harmful-brain-activity-classification/code/~/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/torch/multiprocessing/spawn.py:196) # Loop on join until it returns True or raises an exception.
--> [197](https://file+.vscode-resource.vscode-cdn.net/home/felipe/ssdpny0/hms-harmful-brain-activity-classification/code/~/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/torch/multiprocessing/spawn.py:197) while not context.join():
[198](https://file+.vscode-resource.vscode-cdn.net/home/felipe/ssdpny0/hms-harmful-brain-activity-classification/code/~/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/torch/multiprocessing/spawn.py:198) pass
File [~/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/torch/multiprocessing/spawn.py:158](https://file+.vscode-resource.vscode-cdn.net/home/felipe/ssdpny0/hms-harmful-brain-activity-classification/code/~/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/torch/multiprocessing/spawn.py:158), in ProcessContext.join(self, timeout)
[157](https://file+.vscode-resource.vscode-cdn.net/home/felipe/ssdpny0/hms-harmful-brain-activity-classification/code/~/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/torch/multiprocessing/spawn.py:157) msg += original_trace
--> [158](https://file+.vscode-resource.vscode-cdn.net/home/felipe/ssdpny0/hms-harmful-brain-activity-classification/code/~/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/torch/multiprocessing/spawn.py:158) raise ProcessRaisedException(msg, error_index, failed_process.pid)
ProcessRaisedException:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/torch/cuda/__init__.py", line 315, in _lazy_init
queued_call()
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/torch/cuda/__init__.py", line 183, in _check_capability
capability = get_device_capability(d)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/torch/cuda/__init__.py", line 439, in get_device_capability
prop = get_device_properties(device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/torch/cuda/__init__.py", line 457, in get_device_properties
return _get_device_properties(device) # type: ignore[name-defined]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "/opt/conda/conda-bld/pytorch_1704987288773/work/aten/src/ATen/cuda/CUDAContext.cpp":50, please report a bug to PyTorch. device=2, num_gpus=
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/torch/multiprocessing/spawn.py", line 68, in _wrap
fn(i, *args)
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/accelerate/utils/launch.py", line 570, in __call__
self.launcher(*args)
File "/tmp/ipykernel_1310472/2664963675.py", line 3, in train_ddp_accelerate
accelerator = Accelerator(mixed_precision='fp16')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/accelerate/accelerator.py", line 378, in __init__
self.state = AcceleratorState(
^^^^^^^^^^^^^^^^^
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/accelerate/state.py", line 771, in __init__
PartialState(cpu, **kwargs)
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/accelerate/state.py", line 236, in __init__
torch.cuda.set_device(self.device)
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/torch/cuda/__init__.py", line 408, in set_device
torch._C._cuda_setDevice(device)
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/torch/cuda/__init__.py", line 321, in _lazy_init
raise DeferredCudaCallError(msg) from e
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "/opt/conda/conda-bld/pytorch_1704987288773/work/aten/src/ATen/cuda/CUDAContext.cpp":50, please report a bug to PyTorch. device=2, num_gpus=
CUDA call was originally invoked at:
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/ipykernel_launcher.py", line 18, in <module>
app.launch_new_instance()
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/traitlets/config/application.py", line 1075, in launch_instance
app.start()
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/ipykernel/kernelapp.py", line 739, in start
self.io_loop.start()
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/tornado/platform/asyncio.py", line 195, in start
self.asyncio_loop.run_forever()
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/asyncio/base_events.py", line 607, in run_forever
self._run_once()
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/asyncio/base_events.py", line 1922, in _run_once
handle._run()
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/ipykernel/kernelbase.py", line 542, in dispatch_queue
await self.process_one()
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/ipykernel/kernelbase.py", line 531, in process_one
await dispatch(*args)
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/ipykernel/kernelbase.py", line 437, in dispatch_shell
await result
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/ipykernel/ipkernel.py", line 359, in execute_request
await super().execute_request(stream, ident, parent)
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/ipykernel/kernelbase.py", line 775, in execute_request
reply_content = await reply_content
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/ipykernel/ipkernel.py", line 446, in do_execute
res = shell.run_cell(
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/ipykernel/zmqshell.py", line 549, in run_cell
return super().run_cell(*args, **kwargs)
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3051, in run_cell
result = self._run_cell(
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3106, in _run_cell
result = runner(coro)
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/IPython/core/async_helpers.py", line 129, in _pseudo_sync_runner
coro.send(None)
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3311, in run_cell_async
has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3493, in run_ast_nodes
if await self.run_code(code, result, async_=asy):
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3553, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "/tmp/ipykernel_1310472/3735111654.py", line 18, in <module>
import torch
File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 940, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/torch/__init__.py", line 1421, in <module>
_C._initExtension(manager_path())
File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 940, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/torch/cuda/__init__.py", line 247, in <module>
_lazy_call(_check_capability)
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/torch/cuda/__init__.py", line 244, in _lazy_call
_queued_calls.append((callable, traceback.format_stack()))
The above exception was the direct cause of the following exception:
RuntimeError Traceback (most recent call last)
Cell In[31], [line 6](vscode-notebook-cell:?execution_count=31&line=6)
[4](vscode-notebook-cell:?execution_count=31&line=4) output_path.mkdir(exist_ok=True)
[5](vscode-notebook-cell:?execution_count=31&line=5) print(f"[fold{fold_id}]")
----> [6](vscode-notebook-cell:?execution_count=31&line=6) notebook_launcher(train_ddp_accelerate, args=(CFG, fold_id, train, output_path), num_processes=2,mixed_precision='fp16')
File [~/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/accelerate/launchers.py:210](https://file+.vscode-resource.vscode-cdn.net/home/felipe/ssdpny0/hms-harmful-brain-activity-classification/code/~/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/accelerate/launchers.py:210), in notebook_launcher(function, args, num_processes, mixed_precision, use_port, master_addr, node_rank, num_nodes)
[203](https://file+.vscode-resource.vscode-cdn.net/home/felipe/ssdpny0/hms-harmful-brain-activity-classification/code/~/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/accelerate/launchers.py:203) raise RuntimeError(
[204](https://file+.vscode-resource.vscode-cdn.net/home/felipe/ssdpny0/hms-harmful-brain-activity-classification/code/~/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/accelerate/launchers.py:204) "CUDA has been initialized before the `notebook_launcher` could create a forked subprocess. "
[205](https://file+.vscode-resource.vscode-cdn.net/home/felipe/ssdpny0/hms-harmful-brain-activity-classification/code/~/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/accelerate/launchers.py:205) "This likely stems from an outside import causing issues once the `notebook_launcher()` is called. "
[206](https://file+.vscode-resource.vscode-cdn.net/home/felipe/ssdpny0/hms-harmful-brain-activity-classification/code/~/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/accelerate/launchers.py:206) "Please review your imports and test them when running the `notebook_launcher()` to identify "
[207](https://file+.vscode-resource.vscode-cdn.net/home/felipe/ssdpny0/hms-harmful-brain-activity-classification/code/~/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/accelerate/launchers.py:207) "which one is problematic and causing CUDA to be initialized."
[208](https://file+.vscode-resource.vscode-cdn.net/home/felipe/ssdpny0/hms-harmful-brain-activity-classification/code/~/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/accelerate/launchers.py:208) ) from e
[209](https://file+.vscode-resource.vscode-cdn.net/home/felipe/ssdpny0/hms-harmful-brain-activity-classification/code/~/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/accelerate/launchers.py:209) else:
--> [210](https://file+.vscode-resource.vscode-cdn.net/home/felipe/ssdpny0/hms-harmful-brain-activity-classification/code/~/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/accelerate/launchers.py:210) raise RuntimeError(f"An issue was found when launching the training: {e}") from e
[212](https://file+.vscode-resource.vscode-cdn.net/home/felipe/ssdpny0/hms-harmful-brain-activity-classification/code/~/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/accelerate/launchers.py:212) else:
[213](https://file+.vscode-resource.vscode-cdn.net/home/felipe/ssdpny0/hms-harmful-brain-activity-classification/code/~/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/accelerate/launchers.py:213) # No need for a distributed launch otherwise as it's either CPU, GPU or MPS.
[214](https://file+.vscode-resource.vscode-cdn.net/home/felipe/ssdpny0/hms-harmful-brain-activity-classification/code/~/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/accelerate/launchers.py:214) if is_mps_available():
RuntimeError: An issue was found when launching the training:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/torch/cuda/__init__.py", line 315, in _lazy_init
queued_call()
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/torch/cuda/__init__.py", line 183, in _check_capability
capability = get_device_capability(d)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/torch/cuda/__init__.py", line 439, in get_device_capability
prop = get_device_properties(device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/torch/cuda/__init__.py", line 457, in get_device_properties
return _get_device_properties(device) # type: ignore[name-defined]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "/opt/conda/conda-bld/pytorch_1704987288773/work/aten/src/ATen/cuda/CUDAContext.cpp":50, please report a bug to PyTorch. device=2, num_gpus=
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/torch/multiprocessing/spawn.py", line 68, in _wrap
fn(i, *args)
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/accelerate/utils/launch.py", line 570, in __call__
self.launcher(*args)
File "/tmp/ipykernel_1310472/2664963675.py", line 3, in train_ddp_accelerate
accelerator = Accelerator(mixed_precision='fp16')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/accelerate/accelerator.py", line 378, in __init__
self.state = AcceleratorState(
^^^^^^^^^^^^^^^^^
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/accelerate/state.py", line 771, in __init__
PartialState(cpu, **kwargs)
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/accelerate/state.py", line 236, in __init__
torch.cuda.set_device(self.device)
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/torch/cuda/__init__.py", line 408, in set_device
torch._C._cuda_setDevice(device)
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/torch/cuda/__init__.py", line 321, in _lazy_init
raise DeferredCudaCallError(msg) from e
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "/opt/conda/conda-bld/pytorch_1704987288773/work/aten/src/ATen/cuda/CUDAContext.cpp":50, please report a bug to PyTorch. device=2, num_gpus=
CUDA call was originally invoked at:
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/ipykernel_launcher.py", line 18, in <module>
app.launch_new_instance()
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/traitlets/config/application.py", line 1075, in launch_instance
app.start()
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/ipykernel/kernelapp.py", line 739, in start
self.io_loop.start()
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/tornado/platform/asyncio.py", line 195, in start
self.asyncio_loop.run_forever()
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/asyncio/base_events.py", line 607, in run_forever
self._run_once()
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/asyncio/base_events.py", line 1922, in _run_once
handle._run()
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/ipykernel/kernelbase.py", line 542, in dispatch_queue
await self.process_one()
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/ipykernel/kernelbase.py", line 531, in process_one
await dispatch(*args)
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/ipykernel/kernelbase.py", line 437, in dispatch_shell
await result
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/ipykernel/ipkernel.py", line 359, in execute_request
await super().execute_request(stream, ident, parent)
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/ipykernel/kernelbase.py", line 775, in execute_request
reply_content = await reply_content
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/ipykernel/ipkernel.py", line 446, in do_execute
res = shell.run_cell(
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/ipykernel/zmqshell.py", line 549, in run_cell
return super().run_cell(*args, **kwargs)
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3051, in run_cell
result = self._run_cell(
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3106, in _run_cell
result = runner(coro)
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/IPython/core/async_helpers.py", line 129, in _pseudo_sync_runner
coro.send(None)
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3311, in run_cell_async
has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3493, in run_ast_nodes
if await self.run_code(code, result, async_=asy):
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3553, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "/tmp/ipykernel_1310472/3735111654.py", line 18, in <module>
import torch
File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 940, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/torch/__init__.py", line 1421, in <module>
_C._initExtension(manager_path())
File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 940, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/torch/cuda/__init__.py", line 247, in <module>
_lazy_call(_check_capability)
File "/home/felipe/anaconda3/envs/cuda_12.1/lib/python3.11/site-packages/torch/cuda/__init__.py", line 244, in _lazy_call
_queued_calls.append((callable, traceback.format_stack()))
Expected behavior
It runs with 2 GPUs as it does with 3 GPUs and 3 processes
Hi @MrRobot2211, could you try to run it without setting os.environ["CUDA_VISIBLE_DEVICES"] = "1,2"
. I want to check if this is the line that causes the issue. Thanks !
Hello, it does run to completion without setting CUDA visible devices ( it runs on 3 GPUs). If you can point me to a code/tutorial that you are confident should run identically both ways I am happy to try that.
On Fri, Feb 23, 2024, 11:51 Marc Sun @.***> wrote:
Hi @MrRobot2211 https://github.com/MrRobot2211, could you try to run it without setting os.environ["CUDA_VISIBLE_DEVICES"] = "1,2". I want to check if this is the line that causes the issue. Thanks !
— Reply to this email directly, view it on GitHub https://github.com/huggingface/accelerate/issues/2459#issuecomment-1961667118, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFTMVKELX64XJLTKIGIVBYDYVDCKBAVCNFSM6AAAAABDN2YQ62VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRRGY3DOMJRHA . You are receiving this because you were mentioned.Message ID: @.***>
Thanks ! I will investigate why it is happening. If you could share a minimal reproducer, that would help me a lot to fix this issue.
@MrRobot2211 what happens if you set CUDA_VISIBLE_DEVICES before any import to torch/accelerate using os
?
IIRC this is needed to happen because torch does some things on import
@muellerzr yeahp that did it thank you. Incidentally I was able to get rid of this os.environ["NCCL_P2P_DISABLE"]="1"
by creating a get_dataloader
function.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.