I use CUDA_VISIBLE_DEVICES=0,1 or CUDA_VISIBLE_DEVICES=0

It turned out using CPU for training.

Apr 14 '25 11:04 flydragon2018

Something below may help?

installing nvtop can help debug as well

🔍 CUDA Training Troubleshooting Guide

On my rig I use cuda 12.8 but below is 12.1. Just swap cu121 with cu12x or cu11x below where x is 8 in my case and 1 below.

🚨 Most Likely Issues

1️⃣ PyTorch Not Installed with CUDA Support

# Check if PyTorch recognizes CUDA
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")  # Should NOT be None

If torch.cuda.is_available() returns False or torch.version.cuda is None, reinstall PyTorch with CUDA support:

# For CUDA 12.1 - adjust version as needed
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

2️⃣ Check Model/Tensor Device Assignment

# Explicitly move model to GPU
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Move model and data to GPU
model = model.to(device)
inputs = inputs.to(device)  # Do this for all tensors

3️⃣ Verify GPU Usage During Training

# Run this in a separate terminal while training
watch -n 1 nvidia-smi

or

nvtop

🔧 Additional Troubleshooting Steps

4️⃣ CUDA Environment Variables

# Show current CUDA environment vars
echo $CUDA_VISIBLE_DEVICES

# Set specific GPU(s)
export CUDA_VISIBLE_DEVICES=0  # Use only first GPU
# or
export CUDA_VISIBLE_DEVICES=0,1  # Use first and second GPU

5️⃣ CUDA Installation Verification

# Check NVIDIA driver
nvidia-smi

# Check CUDA compiler
nvcc --version

6️⃣ Memory Issues

# Try reducing batch size
# Before running your model:
torch.cuda.empty_cache()

🧰 Complete CUDA Setup (If Needed)

1. Install Compatible NVIDIA Driver

Visit [NVIDIA Driver Downloads](https://www.nvidia.com/drivers) and install the appropriate driver.

2. Install CUDA Toolkit

Visit [NVIDIA CUDA Toolkit Archive](https://developer.nvidia.com/cuda-toolkit-archive) and install a compatible version.

3. Set Environment Variables

Windows:

Path += C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin
Path += C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\libnvvp

Linux/macOS:

export PATH="/usr/local/cuda-12.1/bin${PATH:+:${PATH}}"
export LD_LIBRARY_PATH="/usr/local/cuda-12.1/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}"

4. Install PyTorch with CUDA Support

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

5. Restart System

A full restart can resolve many CUDA initialization issues.

📊 Debugging Code Example

import torch
import numpy as np
import time

# Debug info
print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")
if torch.cuda.is_available():
    print(f"GPU device name: {torch.cuda.get_device_name(0)}")
    print(f"GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")

# Test CPU vs GPU performance
def test_performance():
    # Create random tensor
    size = 5000
    cpu_tensor = torch.randn(size, size)
    
    # CPU test
    start = time.time()
    cpu_result = torch.matmul(cpu_tensor, cpu_tensor)
    cpu_time = time.time() - start
    print(f"CPU time: {cpu_time:.4f} seconds")
    
    # GPU test (if available)
    if torch.cuda.is_available():
        gpu_tensor = cpu_tensor.cuda()
        torch.cuda.synchronize()  # Wait for CUDA operations to complete
        
        start = time.time()
        gpu_result = torch.matmul(gpu_tensor, gpu_tensor)
        torch.cuda.synchronize()  # Wait for computation to complete
        gpu_time = time.time() - start
        
        print(f"GPU time: {gpu_time:.4f} seconds")
        print(f"GPU speedup: {cpu_time/gpu_time:.1f}x faster")
        
        # Verify results match
        cpu_sum = cpu_result.sum().item()
        gpu_sum = gpu_result.sum().item()
        print(f"Results match: {np.isclose(cpu_sum, gpu_sum)}")

test_performance()

If the GPU time is significantly faster than CPU time, CUDA is working correctly.

Good luck! Let me know if you need additional assistance with GPU training. 🚀

Apr 16 '25 01:04 hidara2000

the Code seems using distributed GPUs for training only.

Apr 25 '25 04:04 flydragon2018

@flydragon2018 As stated in our paper, the model can be trained using a single GPU. You can try the following command — it should work: export CUDA_VISIBLE_DEVICES=4 python train.py --use-amp --seed=0
-c configs/deimv2/deimv2_hgnetv2_l_60e.yml

Apr 25 '25 04:04 ShihuaHuang95

@flydragon2018 As stated in our paper, the model can be trained using a single GPU. You can try the following command — it should work: export CUDA_VISIBLE_DEVICES=4 python train.py --use-amp --seed=0 -c configs/deimv2/deimv2_hgnetv2_l_60e.yml

I‘sorry, but The path or file configs/deimv2/deimv2_hgnetv2_l_60e.yml does not exist in the official repository. Dose the repository need to refresh？

Apr 26 '25 09:04 Lcance

how to fine-tune train with double or single GPU?

🔍 CUDA Training Troubleshooting Guide

🚨 Most Likely Issues

1️⃣ PyTorch Not Installed with CUDA Support

2️⃣ Check Model/Tensor Device Assignment

3️⃣ Verify GPU Usage During Training

🔧 Additional Troubleshooting Steps

4️⃣ CUDA Environment Variables

5️⃣ CUDA Installation Verification

6️⃣ Memory Issues

🧰 Complete CUDA Setup (If Needed)

1. Install Compatible NVIDIA Driver

2. Install CUDA Toolkit

3. Set Environment Variables

4. Install PyTorch with CUDA Support

5. Restart System

📊 Debugging Code Example