Passing torch tensor to encoder leads to hang
Version
0.5.0
Describe the bug.
If you pass torch tensor to encoder the system will just hang indefinitely. i think it is better to at least throw exception or maybe convert it to cupy array under the hood
Minimum reproducible example
Environment details
Relevant log output
Other/Misc.
No response
Check for duplicates
- [x] I have searched the open bugs/issues and have found no duplicates for this bug report
actually, after further investigation I realized that cupy.asarray(tensor) helps for CPU tensors, but CUDA tensors still exhibit hangs. is it possible to encode torch cuda tensor with 0 copies?
Hi @s1ddok,
It is possible to pass nvImageCodec image to torch without copy using either DLPack or __cuda_array_interface__. Similarly torch tensor can be accepted directly by nvImageCodec using mentioned interfaces. However, there are limitation of shape of tensor and its layout. Currently, tensor must always have a three-dimensional shape in the HWC layout (height, width, channels), which is also known as the interleaved format, and be stored as a contiguous array in C-style.
I have attached compressed torch.zip torch.ipynb Jupyter notebook with examples. You would need to copy it to nvImageCodec examples folder as we reference image for image subfolder (or modify path/file name accordingly ). This is very similar to Tensorflow example from nvImageCodec documentation Could you please try it and let us know if it works in your case?
Short code snippet can be like that
import torch
from nvidia import nvimgcodec
import matplotlib.pyplot as plt
decoder = nvimgcodec.Decoder()
encoder = nvimgcodec.Encoder()
nv_img = decoder.read(resources_dir + "cat-1046544_640.jp2")
torch_img_flip_vh = torch.flip(torch.as_tensor(nv_img), dims=[0,1])
encoder.write("torch_flipped_vh.j2k", torch_img_flip_vh)
nv_img_vh = nvimgcodec.as_image(torch_img_flip_vh)
plt.imshow(nv_img_vh.cpu())
@smatysik-nv this code reliably results in CPU-hang for me.
"""
Test script for nvImageCodec TIFF encoding with GPU tensors.
This script generates dummy images and saves them as TIFF files using GPU acceleration.
"""
import torch
import numpy as np
import time
import os
from pathlib import Path
import argparse
try:
from nvidia import nvimgcodec
print("✓ nvImageCodec imported successfully")
NVIMGCODEC_AVAILABLE = True
except ImportError as e:
print(f"✗ Failed to import nvImageCodec: {e}")
print("Please install: pip install nvidia-nvimgcodec-cu12[nvtiff]")
NVIMGCODEC_AVAILABLE = False
exit(1)
def create_dummy_image(width=1920, height=1080, device='cuda'):
"""Create a dummy RGB image tensor on specified device with full 16-bit range."""
# Create coordinate grids
y_coords = torch.linspace(0, 1, height, device=device).unsqueeze(1).expand(-1, width)
x_coords = torch.linspace(0, 1, width, device=device).unsqueeze(0).expand(height, -1)
# IMPORTANT: Create complex patterns that utilize full 16-bit range
# Use multiple frequency components for better 16-bit utilization
r_channel = (
0.3 * torch.sin(y_coords * 3.14159 * 4) +
0.2 * torch.cos(x_coords * 3.14159 * 6) +
0.1 * torch.sin((x_coords + y_coords) * 3.14159 * 8) +
0.4
).clamp(0, 1)
g_channel = (
0.25 * torch.cos(x_coords * 3.14159 * 5) +
0.25 * torch.sin(y_coords * 3.14159 * 7) +
0.15 * torch.cos((x_coords - y_coords) * 3.14159 * 3) +
0.35
).clamp(0, 1)
b_channel = (
0.2 * torch.sin((x_coords + y_coords) * 3.14159 * 9) +
0.3 * torch.cos((x_coords * y_coords) * 3.14159 * 2) +
0.1 * torch.sin(y_coords * 3.14159 * 12) +
0.4
).clamp(0, 1)
# Stack into HWC format and convert to full 16-bit range (0-65535)
image = torch.stack([r_channel, g_channel, b_channel], dim=-1)
image = (image * 65535).to(torch.uint16)
return image
def test_nvimgcodec_tiff_encoding(num_images=50, width=1920, height=1080, output_dir="test_tiff_output"):
"""Test nvImageCodec TIFF encoding with multiple images."""
if not torch.cuda.is_available():
print("⚠️ CUDA not available, using CPU")
device = 'cpu'
else:
device = 'cuda'
print(f"✓ Using CUDA device: {torch.cuda.get_device_name()}")
# Create output directory
output_path = Path(output_dir)
output_path.mkdir(exist_ok=True)
print(f"✓ Output directory: {output_path.absolute()}")
# Create encoder instance
try:
encoder = nvimgcodec.Encoder()
print("✓ nvImageCodec Encoder created successfully")
except Exception as e:
print(f"✗ Failed to create encoder: {e}")
return False
# Generate and save images
total_time = 0
successful_saves = 0
print(f"\n🔄 Generating and saving {num_images} 16-bit TIFF images ({width}x{height})...")
for i in range(num_images):
try:
# Create dummy image
start_time = time.perf_counter()
image_tensor = create_dummy_image(width, height, device)
# Save as TIFF
output_file = output_path / f"test_image_{i:04d}.tiff"
# IMPORTANT: nvImageCodec should handle torch tensors with __cuda_array_interface__
encoder.write(str(output_file), image_tensor)
end_time = time.perf_counter()
elapsed = end_time - start_time
total_time += elapsed
successful_saves += 1
print(f" ✓ Saved {output_file.name} in {elapsed:.3f}s")
# Print tensor info for first image
if i == 0:
print(f" 📊 Tensor info: shape={image_tensor.shape}, dtype={image_tensor.dtype}, device={image_tensor.device}")
if hasattr(image_tensor, '__cuda_array_interface__'):
print(f" 🎯 CUDA array interface available")
else:
print(f" ⚠️ No CUDA array interface")
except Exception as e:
print(f" ✗ Failed to save image {i}: {e}")
continue
# Print summary
avg_time = total_time / successful_saves if successful_saves > 0 else 0
print(f"\n📈 Results:")
print(f" • Successfully saved: {successful_saves}/{num_images} images")
print(f" • Total time: {total_time:.3f}s")
print(f" • Average time per image: {avg_time:.3f}s")
print(f" • Images per second: {1/avg_time:.1f}" if avg_time > 0 else " • Images per second: N/A")
# Check file sizes
if successful_saves > 0:
first_file = output_path / "test_image_0000.tiff"
if first_file.exists():
file_size_mb = first_file.stat().st_size / (1024 * 1024)
print(f" • File size: {file_size_mb:.2f} MB per image")
return successful_saves == num_images
def test_cpu_vs_gpu_performance(width=1920, height=1080):
"""Compare CPU vs GPU tensor performance."""
print(f"\n🚀 Performance comparison: CPU vs GPU tensors")
if not torch.cuda.is_available():
print("⚠️ CUDA not available, skipping GPU test")
return
encoder = nvimgcodec.Encoder()
output_path = Path("test_tiff_output")
# Test with CPU tensor
print(" Testing CPU tensor...")
cpu_tensor = create_dummy_image(width, height, 'cpu')
start_time = time.perf_counter()
encoder.write(str(output_path / "cpu_test.tiff"), cpu_tensor)
cpu_time = time.perf_counter() - start_time
print(f" CPU time: {cpu_time:.3f}s")
# Test with GPU tensor
print(" Testing GPU tensor...")
gpu_tensor = create_dummy_image(width, height, 'cuda')
start_time = time.perf_counter()
encoder.write(str(output_path / "gpu_test.tiff"), gpu_tensor)
gpu_time = time.perf_counter() - start_time
print(f" GPU time: {gpu_time:.3f}s")
if gpu_time < cpu_time:
speedup = cpu_time / gpu_time
print(f" 🎯 GPU is {speedup:.1f}x faster!")
else:
slowdown = gpu_time / cpu_time
print(f" ⚠️ GPU is {slowdown:.1f}x slower (unexpected)")
def main():
parser = argparse.ArgumentParser(description="Test nvImageCodec TIFF encoding")
parser.add_argument("--num_images", type=int, default=5, help="Number of test images to generate")
parser.add_argument("--width", type=int, default=1920, help="Image width")
parser.add_argument("--height", type=int, default=1080, help="Image height")
parser.add_argument("--output_dir", type=str, default="test_tiff_output", help="Output directory")
parser.add_argument("--performance_test", action="store_true", help="Run CPU vs GPU performance comparison")
args = parser.parse_args()
print("🧪 nvImageCodec TIFF Encoding Test")
print("=" * 50)
# Basic encoding test
success = test_nvimgcodec_tiff_encoding(
num_images=args.num_images,
width=args.width,
height=args.height,
output_dir=args.output_dir
)
# Performance comparison
if args.performance_test:
test_cpu_vs_gpu_performance(args.width, args.height)
if success:
print("\n🎉 All tests passed! nvImageCodec TIFF encoding is working correctly.")
else:
print("\n❌ Some tests failed. Check the errors above.")
exit(1)
if __name__ == "__main__":
main()
there is no way to quit it other than killing the process. The code is LLM generated but since it is small I thought I'd add it here
Hi @s1ddok,
Your example worked for me without any issues with v0.5 if I didn't pass any arguments. However, it did hang when I passed the --performance_test option, and the hang occurred during CPU tensor encoding.
If you enable a higher logging level (via the PYNVIMGCODEC_VERBOSITY=2 environment variable), you will see a message:
[WARNING] [pynvimgcodec] Input object #0 cannot be converted to Image. Unsupported device in DLTensor. Only CUDA-accessible memory buffers can be wrapped.
nvImageCodec does not yet support tensors on the CPU; they need to be on the GPU.
In v0.5, there was a bug where encoding would freeze (as you saw) if no valid input was provided. This was fixed in v0.6. Please update and verify that your example works now.
If it hangs for you with CUDA tensors too, could you share what are your python env versions? I would suggest updating to newest versions, as I didn't observed any problems with CUDA tensors.