how to fine-tune train with double or single GPU?
I use CUDA_VISIBLE_DEVICES=0,1 or CUDA_VISIBLE_DEVICES=0
It turned out using CPU for training.
Something below may help?
installing nvtop can help debug as well
🔍 CUDA Training Troubleshooting Guide
On my rig I use cuda 12.8 but below is 12.1. Just swap cu121 with cu12x or cu11x below where x is 8 in my case and 1 below.
🚨 Most Likely Issues
1️⃣ PyTorch Not Installed with CUDA Support
# Check if PyTorch recognizes CUDA
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}") # Should NOT be None
If torch.cuda.is_available() returns False or torch.version.cuda is None, reinstall PyTorch with CUDA support:
# For CUDA 12.1 - adjust version as needed
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
2️⃣ Check Model/Tensor Device Assignment
# Explicitly move model to GPU
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
# Move model and data to GPU
model = model.to(device)
inputs = inputs.to(device) # Do this for all tensors
3️⃣ Verify GPU Usage During Training
# Run this in a separate terminal while training
watch -n 1 nvidia-smi
or
nvtop
🔧 Additional Troubleshooting Steps
4️⃣ CUDA Environment Variables
# Show current CUDA environment vars
echo $CUDA_VISIBLE_DEVICES
# Set specific GPU(s)
export CUDA_VISIBLE_DEVICES=0 # Use only first GPU
# or
export CUDA_VISIBLE_DEVICES=0,1 # Use first and second GPU
5️⃣ CUDA Installation Verification
# Check NVIDIA driver
nvidia-smi
# Check CUDA compiler
nvcc --version
6️⃣ Memory Issues
# Try reducing batch size
# Before running your model:
torch.cuda.empty_cache()
🧰 Complete CUDA Setup (If Needed)
1. Install Compatible NVIDIA Driver
Visit [NVIDIA Driver Downloads](https://www.nvidia.com/drivers) and install the appropriate driver.
2. Install CUDA Toolkit
Visit [NVIDIA CUDA Toolkit Archive](https://developer.nvidia.com/cuda-toolkit-archive) and install a compatible version.
3. Set Environment Variables
Windows:
Path += C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin
Path += C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\libnvvp
Linux/macOS:
export PATH="/usr/local/cuda-12.1/bin${PATH:+:${PATH}}"
export LD_LIBRARY_PATH="/usr/local/cuda-12.1/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}"
4. Install PyTorch with CUDA Support
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
5. Restart System
A full restart can resolve many CUDA initialization issues.
📊 Debugging Code Example
import torch
import numpy as np
import time
# Debug info
print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")
if torch.cuda.is_available():
print(f"GPU device name: {torch.cuda.get_device_name(0)}")
print(f"GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
# Test CPU vs GPU performance
def test_performance():
# Create random tensor
size = 5000
cpu_tensor = torch.randn(size, size)
# CPU test
start = time.time()
cpu_result = torch.matmul(cpu_tensor, cpu_tensor)
cpu_time = time.time() - start
print(f"CPU time: {cpu_time:.4f} seconds")
# GPU test (if available)
if torch.cuda.is_available():
gpu_tensor = cpu_tensor.cuda()
torch.cuda.synchronize() # Wait for CUDA operations to complete
start = time.time()
gpu_result = torch.matmul(gpu_tensor, gpu_tensor)
torch.cuda.synchronize() # Wait for computation to complete
gpu_time = time.time() - start
print(f"GPU time: {gpu_time:.4f} seconds")
print(f"GPU speedup: {cpu_time/gpu_time:.1f}x faster")
# Verify results match
cpu_sum = cpu_result.sum().item()
gpu_sum = gpu_result.sum().item()
print(f"Results match: {np.isclose(cpu_sum, gpu_sum)}")
test_performance()
If the GPU time is significantly faster than CPU time, CUDA is working correctly.
Good luck! Let me know if you need additional assistance with GPU training. 🚀
the Code seems using distributed GPUs for training only.
@flydragon2018 As stated in our paper, the model can be trained using a single GPU. You can try the following command — it should work:
export CUDA_VISIBLE_DEVICES=4 python train.py --use-amp --seed=0
-c configs/deimv2/deimv2_hgnetv2_l_60e.yml
@flydragon2018 As stated in our paper, the model can be trained using a single GPU. You can try the following command — it should work: export CUDA_VISIBLE_DEVICES=4 python train.py --use-amp --seed=0 -c configs/deimv2/deimv2_hgnetv2_l_60e.yml
I‘sorry, but The path or file configs/deimv2/deimv2_hgnetv2_l_60e.yml does not exist in the official repository. Dose the repository need to refresh?