nvImageCodec Passing torch tensor to encoder leads to hang

Version

0.5.0

Describe the bug.

If you pass torch tensor to encoder the system will just hang indefinitely. i think it is better to at least throw exception or maybe convert it to cupy array under the hood

Minimum reproducible example

Environment details

Relevant log output

Other/Misc.

No response

Check for duplicates

[x] I have searched the open bugs/issues and have found no duplicates for this bug report

Jul 29 '25 13:07 s1ddok

actually, after further investigation I realized that cupy.asarray(tensor) helps for CPU tensors, but CUDA tensors still exhibit hangs. is it possible to encode torch cuda tensor with 0 copies?

Jul 29 '25 13:07 s1ddok

Hi @s1ddok,

It is possible to pass nvImageCodec image to torch without copy using either DLPack or __cuda_array_interface__. Similarly torch tensor can be accepted directly by nvImageCodec using mentioned interfaces. However, there are limitation of shape of tensor and its layout. Currently, tensor must always have a three-dimensional shape in the HWC layout (height, width, channels), which is also known as the interleaved format, and be stored as a contiguous array in C-style.

I have attached compressed torch.zip torch.ipynb Jupyter notebook with examples. You would need to copy it to nvImageCodec examples folder as we reference image for image subfolder (or modify path/file name accordingly ). This is very similar to Tensorflow example from nvImageCodec documentation Could you please try it and let us know if it works in your case?

Short code snippet can be like that

import torch
from nvidia import nvimgcodec
import matplotlib.pyplot as plt

decoder = nvimgcodec.Decoder()
encoder = nvimgcodec.Encoder()
nv_img = decoder.read(resources_dir + "cat-1046544_640.jp2")
torch_img_flip_vh = torch.flip(torch.as_tensor(nv_img), dims=[0,1])
encoder.write("torch_flipped_vh.j2k", torch_img_flip_vh)
nv_img_vh = nvimgcodec.as_image(torch_img_flip_vh)
plt.imshow(nv_img_vh.cpu())

Aug 01 '25 14:08 smatysik-nv

@smatysik-nv this code reliably results in CPU-hang for me.

"""
Test script for nvImageCodec TIFF encoding with GPU tensors.
This script generates dummy images and saves them as TIFF files using GPU acceleration.
"""

import torch
import numpy as np
import time
import os
from pathlib import Path
import argparse

try:
    from nvidia import nvimgcodec
    print("✓ nvImageCodec imported successfully")
    NVIMGCODEC_AVAILABLE = True
except ImportError as e:
    print(f"✗ Failed to import nvImageCodec: {e}")
    print("Please install: pip install nvidia-nvimgcodec-cu12[nvtiff]")
    NVIMGCODEC_AVAILABLE = False
    exit(1)

def create_dummy_image(width=1920, height=1080, device='cuda'):
    """Create a dummy RGB image tensor on specified device with full 16-bit range."""
    # Create coordinate grids
    y_coords = torch.linspace(0, 1, height, device=device).unsqueeze(1).expand(-1, width)
    x_coords = torch.linspace(0, 1, width, device=device).unsqueeze(0).expand(height, -1)
    
    # IMPORTANT: Create complex patterns that utilize full 16-bit range
    # Use multiple frequency components for better 16-bit utilization
    r_channel = (
        0.3 * torch.sin(y_coords * 3.14159 * 4) +
        0.2 * torch.cos(x_coords * 3.14159 * 6) +
        0.1 * torch.sin((x_coords + y_coords) * 3.14159 * 8) +
        0.4
    ).clamp(0, 1)
    
    g_channel = (
        0.25 * torch.cos(x_coords * 3.14159 * 5) +
        0.25 * torch.sin(y_coords * 3.14159 * 7) +
        0.15 * torch.cos((x_coords - y_coords) * 3.14159 * 3) +
        0.35
    ).clamp(0, 1)
    
    b_channel = (
        0.2 * torch.sin((x_coords + y_coords) * 3.14159 * 9) +
        0.3 * torch.cos((x_coords * y_coords) * 3.14159 * 2) +
        0.1 * torch.sin(y_coords * 3.14159 * 12) +
        0.4
    ).clamp(0, 1)
    
    # Stack into HWC format and convert to full 16-bit range (0-65535)
    image = torch.stack([r_channel, g_channel, b_channel], dim=-1)
    image = (image * 65535).to(torch.uint16)
    
    return image

def test_nvimgcodec_tiff_encoding(num_images=50, width=1920, height=1080, output_dir="test_tiff_output"):
    """Test nvImageCodec TIFF encoding with multiple images."""
    
    if not torch.cuda.is_available():
        print("⚠️  CUDA not available, using CPU")
        device = 'cpu'
    else:
        device = 'cuda'
        print(f"✓ Using CUDA device: {torch.cuda.get_device_name()}")
    
    # Create output directory
    output_path = Path(output_dir)
    output_path.mkdir(exist_ok=True)
    print(f"✓ Output directory: {output_path.absolute()}")
    
    # Create encoder instance
    try:
        encoder = nvimgcodec.Encoder()
        print("✓ nvImageCodec Encoder created successfully")
    except Exception as e:
        print(f"✗ Failed to create encoder: {e}")
        return False
    
    # Generate and save images
    total_time = 0
    successful_saves = 0
    
    print(f"\n🔄 Generating and saving {num_images} 16-bit TIFF images ({width}x{height})...")
    
    for i in range(num_images):
        try:
            # Create dummy image
            start_time = time.perf_counter()
            image_tensor = create_dummy_image(width, height, device)
            
            # Save as TIFF
            output_file = output_path / f"test_image_{i:04d}.tiff"
            
            # IMPORTANT: nvImageCodec should handle torch tensors with __cuda_array_interface__
            encoder.write(str(output_file), image_tensor)
            
            end_time = time.perf_counter()
            elapsed = end_time - start_time
            total_time += elapsed
            successful_saves += 1
            
            print(f"  ✓ Saved {output_file.name} in {elapsed:.3f}s")
            
            # Print tensor info for first image
            if i == 0:
                print(f"    📊 Tensor info: shape={image_tensor.shape}, dtype={image_tensor.dtype}, device={image_tensor.device}")
                if hasattr(image_tensor, '__cuda_array_interface__'):
                    print(f"    🎯 CUDA array interface available")
                else:
                    print(f"    ⚠️  No CUDA array interface")
            
        except Exception as e:
            print(f"  ✗ Failed to save image {i}: {e}")
            continue
    
    # Print summary
    avg_time = total_time / successful_saves if successful_saves > 0 else 0
    print(f"\n📈 Results:")
    print(f"  • Successfully saved: {successful_saves}/{num_images} images")
    print(f"  • Total time: {total_time:.3f}s")
    print(f"  • Average time per image: {avg_time:.3f}s")
    print(f"  • Images per second: {1/avg_time:.1f}" if avg_time > 0 else "  • Images per second: N/A")
    
    # Check file sizes
    if successful_saves > 0:
        first_file = output_path / "test_image_0000.tiff"
        if first_file.exists():
            file_size_mb = first_file.stat().st_size / (1024 * 1024)
            print(f"  • File size: {file_size_mb:.2f} MB per image")
    
    return successful_saves == num_images

def test_cpu_vs_gpu_performance(width=1920, height=1080):
    """Compare CPU vs GPU tensor performance."""
    print(f"\n🚀 Performance comparison: CPU vs GPU tensors")
    
    if not torch.cuda.is_available():
        print("⚠️  CUDA not available, skipping GPU test")
        return
    
    encoder = nvimgcodec.Encoder()
    output_path = Path("test_tiff_output")
    
    # Test with CPU tensor
    print("  Testing CPU tensor...")
    cpu_tensor = create_dummy_image(width, height, 'cpu')
    start_time = time.perf_counter()
    encoder.write(str(output_path / "cpu_test.tiff"), cpu_tensor)
    cpu_time = time.perf_counter() - start_time
    print(f"    CPU time: {cpu_time:.3f}s")
    
    # Test with GPU tensor
    print("  Testing GPU tensor...")
    gpu_tensor = create_dummy_image(width, height, 'cuda')
    start_time = time.perf_counter()
    encoder.write(str(output_path / "gpu_test.tiff"), gpu_tensor)
    gpu_time = time.perf_counter() - start_time
    print(f"    GPU time: {gpu_time:.3f}s")
    
    if gpu_time < cpu_time:
        speedup = cpu_time / gpu_time
        print(f"    🎯 GPU is {speedup:.1f}x faster!")
    else:
        slowdown = gpu_time / cpu_time
        print(f"    ⚠️  GPU is {slowdown:.1f}x slower (unexpected)")

def main():
    parser = argparse.ArgumentParser(description="Test nvImageCodec TIFF encoding")
    parser.add_argument("--num_images", type=int, default=5, help="Number of test images to generate")
    parser.add_argument("--width", type=int, default=1920, help="Image width")
    parser.add_argument("--height", type=int, default=1080, help="Image height")
    parser.add_argument("--output_dir", type=str, default="test_tiff_output", help="Output directory")
    parser.add_argument("--performance_test", action="store_true", help="Run CPU vs GPU performance comparison")
    args = parser.parse_args()
    
    print("🧪 nvImageCodec TIFF Encoding Test")
    print("=" * 50)
    
    # Basic encoding test
    success = test_nvimgcodec_tiff_encoding(
        num_images=args.num_images,
        width=args.width,
        height=args.height,
        output_dir=args.output_dir
    )
    
    # Performance comparison
    if args.performance_test:
        test_cpu_vs_gpu_performance(args.width, args.height)
    
    if success:
        print("\n🎉 All tests passed! nvImageCodec TIFF encoding is working correctly.")
    else:
        print("\n❌ Some tests failed. Check the errors above.")
        exit(1)

if __name__ == "__main__":
    main()

there is no way to quit it other than killing the process. The code is LLM generated but since it is small I thought I'd add it here

Aug 04 '25 13:08 s1ddok

Hi @s1ddok,

Your example worked for me without any issues with v0.5 if I didn't pass any arguments. However, it did hang when I passed the --performance_test option, and the hang occurred during CPU tensor encoding.

If you enable a higher logging level (via the PYNVIMGCODEC_VERBOSITY=2 environment variable), you will see a message:

[WARNING] [pynvimgcodec] Input object #0 cannot be converted to Image. Unsupported device in DLTensor. Only CUDA-accessible memory buffers can be wrapped.

nvImageCodec does not yet support tensors on the CPU; they need to be on the GPU.

In v0.5, there was a bug where encoding would freeze (as you saw) if no valid input was provided. This was fixed in v0.6. Please update and verify that your example works now.

If it hangs for you with CUDA tensors too, could you share what are your python env versions? I would suggest updating to newest versions, as I didn't observed any problems with CUDA tensors.

Aug 14 '25 06:08 mkepa-nv