audio StreamWriter The h264_nvenc/hevc_nvenc encoder supports the YUV420P format

🚀 The feature

Is the module that utilizes nvenc for accelerated encoding considering support for the yuv420p format?

Motivation, pitch

I am using this module for accelerated video encoding, but upon examining the code, I found that the output YUV format only supports YUV444.

Alternatives

No response

Additional context

No response

May 30 '23 02:05 doraxcyle

Hi @xcyl

This is something I have been thinking as well. What's your thought on return color channel format? For now, we supports formats that have the same plane size (RGB24, YUV444). One way to make this work is to return YUV420p image as one channel tensor which has (width x 1.5*height) and attach UV plane at the bottom (or right side of 1.5*width x height)

May 30 '23 02:05 mthrok

I found that when I set the output format to rgb24/bgr24, the encoded data is in yuv420. Is this the behavior of nvenc?

Hi @xcyl

This is something I have been thinking as well. What's your thought on return color channel format? For now, we supports formats that have the same plane size (RGB24, YUV444). One way to make this work is to return YUV420p image as one channel tensor which has (width x 1.5height) and attach UV plane at the bottom (or right side of 1.5width x height)

May 31 '23 02:05 doraxcyle

Hi @xcyl

This is something I have been thinking as well. What's your thought on return color channel format? For now, we supports formats that have the same plane size (RGB24, YUV444). One way to make this work is to return YUV420p image as one channel tensor which has (width x 1.5height) and attach UV plane at the bottom (or right side of 1.5width x height)

When I use a specified StreamReader to decode h264 data in yuv420p format using cuvid, what is the output format? I tried to extract the data using nv12 and convert it to rgb24, but it didn't work.

May 31 '23 02:05 doraxcyle

Hi @xcyl This is something I have been thinking as well. What's your thought on return color channel format? For now, we supports formats that have the same plane size (RGB24, YUV444). One way to make this work is to return YUV420p image as one channel tensor which has (width x 1.5_height) and attach UV plane at the bottom (or right side of 1.5_width x height)

When I use a specified StreamReader to decode h264 data in yuv420p format using cuvid, what is the output format? I tried to extract the data using nv12 and convert it to rgb24, but it didn't work.

oh sorry I mistook StreamWriter for StreamReader. I thought the issue was about decoding. Currently, the pixel format conversion is not implemented for GPU encoder/decoder. This is because FFmpeg does not provide a flexible interface.

There is sclae_cuda filter support on main branch https://github.com/pytorch/audio/pull/3183, which allows to convert pixel format when decoding video with nvdec. This has not been ported to StreamWriter side.

When I use a specified StreamReader to decode h264 data in yuv420p format using cuvid, what is the output format? I tried to extract the data using nv12 and convert it to rgb24, but it didn't work.

Currently, StreamReader converts the video frames to YUV444 regardless of the decoded format (YUV420p or nv12). To support YUV420P we need to change the output format to the single plane format as I mentioned earlier.

On encoding side, if I understand you correctly, you have RGB frame and want to encode it as YUV420p. This is not yet implemented. The reason is simply because FFmpeg does not provide an implementation we can reuse. One possibility is to use scale_cuda filter to change the pixel format, but IIRC that does not provide RGB->YUV conversion.

I am looking to learn CUDA stuff, and hoping to implementing them but currently there is no ETA to complete this. If someone with expertise in CUDA and video processing can make a propose, then I am happy to consider.

May 31 '23 02:05 mthrok

audio audio copied to clipboard

StreamWriter The h264_nvenc/hevc_nvenc encoder supports the YUV420P format

🚀 The feature

Motivation, pitch

Alternatives

Additional context

audio
audio copied to clipboard