mpp icon indicating copy to clipboard operation
mpp copied to clipboard

setting poc_type=2 on rv1126 results in corrupted bit stream (not decodable)

Open Consti10 opened this issue 4 years ago • 23 comments

Hello,

I wan to use poc_type=2 for encoding on rv1126 since a stream with frame-reordering disabled is generally decodable with lower latency on hw decoders (for reference: a value of 2 for poc_type means that frame re-ordering is disabled - for live streaming, which is the default use case of rv1126, frame re-ordering in general hardly makes sense).

I use mpp_enc_cfg_set_u32(enc_cfg, "h264:poc_type", 2);

in the setup method of rkmedia MPPCommonConfig::InitConfig

but the resulting bit stream is not decodable by gstreamer or any other decoding library. Analyzing the bit stream, one can observe that poc_type is actually set to 2.

But I doubt that the hw encoder "driver"got this information, and therefore still generates a stream for poc_type=0.

What is the recommended way to use poc_type=2 on rv1126 ? Are there any quirks one has to pay attention to when using poc_type=2 on rv1126 ? I've already applied the fixes from https://github.com/rockchip-linux/mpp/issues/201.

Best regagrds, Constantin

Consti10 avatar Sep 06 '21 16:09 Consti10

Hi:

RV1126 does not support the configuration of poc_type=2.

I'm very curious that you said disable reorder can bring low latency. Our encoder is currently single reference by default, there should be no reorder.

FumasterLin avatar Sep 10 '21 07:09 FumasterLin

That's the point. Why use poc_type=0 when the encoder can't do any reordering anyways ? The increased latency comes from the DE-coder- it doesn't know there is no reordering unless poc-type is set to 2. Is that a sw or hw limit on rv1126 ?

Consti10 avatar Sep 10 '21 08:09 Consti10

My understanding is that poc_type only determines the way to calculate poc, and does not affect whether to reorder. And our encoder does not encode B-frames, so there is no reorder in principle.

FumasterLin avatar Sep 10 '21 08:09 FumasterLin

I think poc_type sets two characteristics: 1) weather reordering is allowed at all (poc_type=0=re-ordering allowed and poc_type=2 reo-rdering not allowed) and 2) the way how the poc is calculated (pretty complicated).

What parameters are configurable on rv1126 encoder to decrease the decoding latency ? I can just say that I have observed on qualcomm snapdragon decoders, poc_type=2 => low decoding latency and poc_type=0 => high decoding latency.

Consti10 avatar Sep 10 '21 11:09 Consti10

Or in other words: If I decode a stream generated by rv1126 encoder on a snapdragon decoder, I get a decoding latency of >50ms. Whereas a stream generated by jetson nano (for example, poc_type=2 ) the decoding latency is <10ms.

Consti10 avatar Sep 10 '21 12:09 Consti10

When I look into this file: https://github.com/rockchip-linux/mpp/blob/develop/mpp/hal/rkenc/h264e/hal_h264e_vepu541.c#L507

One can see which parameters are configurable (e.g. changing them in the sps actually maps to register changes of the hw encoder), right ?

And since "poc_type" is missing there, you deduce that changing the poc_type on rv1126 is not possible. Is the "spec sheet" of the rv1126 encoder available publicly ? Perhaps you just didn't bother exposing the register responsible for poc_type there ?

Consti10 avatar Sep 11 '21 21:09 Consti10

Or in other words: If I decode a stream generated by rv1126 encoder on a snapdragon decoder, I get a decoding latency of >50ms. Whereas a stream generated by jetson nano (for example, poc_type=2 ) the decoding latency is <10ms.

what resolusion you generated by rv1126? You can decode it on rv1126 and then print the decode time by the cm : echo 0x104 > /sys/module/rk_vcodec/parameters/mpp_dev_debug.

I think poc_type sets two characteristics: 1) weather reordering is allowed at all (poc_type=0=re-ordering allowed and poc_type=2 reo-rdering not allowed) and 2) the way how the poc is calculated (pretty complicated).

In the h264 protocol, it is not stated that the poc_type is the factor that determines the re-ordering. Whether to re-ordering should look at the following syntax ref_pic_list_reordering()

And since "poc_type" is missing there, you deduce that changing the poc_type on rv1126 is not possible. Is the "spec sheet" of the rv1126 encoder available publicly ? Perhaps you just didn't bother exposing the register responsible for poc_type there ?

Yes, the poc_type affects the syntax elements of the slice header, and the slice header is generate by encoder hw. The encoder is fixed with poc_type=0 and is not reserved for external configuration.

FumasterLin avatar Sep 13 '21 00:09 FumasterLin

I'm using 1080p for my tests. I was able to confirm that decoding on rv1126 with poc_type in <10ms is possible, but that doesn't help when my decoder is not a rockchip device (which is quite common, rockchip as encoder, smartphone as display/decoder device for example).

If I look at the registers that are controllable by the user on rv1126, I find:

log2_max_frame_num_minus4 log2_max_poc_lsb_minus4

How can I change them ? It is not clear to me how you differentiate between values (inside sps) that can be changed by user space and values that cannot be changed by user space.

E.g. I understand that some values (configurations) from the sps can be changed by user space. In this case the encoder obviosly needs to know he has to adopt to these changes (registers).

While some values are not bound to hw registers, and therefore cannot be changed in the sps by user space.

Consti10 avatar Sep 13 '21 14:09 Consti10

I'm using 1080p for my tests. I was able to confirm that decoding on rv1126 with poc_type in <10ms is possible, but that doesn't help when my decoder is not a rockchip device (which is quite common, rockchip as encoder, smartphone as display/decoder device for example).

Our encoder generates a standard stream, and the decoder decodes according to the standard protocol. If our decoder decode stream time < 10ms, other decoder devices decode time > 50ms. It should be the devices problem, like performance not enough.You can use more other devices for your test.

If I look at the registers that are controllable by the user on rv1126, I find: log2_max_frame_num_minus4 log2_max_poc_lsb_minus4 How can I change them ? It is not clear to me how you differentiate between values (inside sps) that can be changed by user space and values that cannot be changed by user space.

You can configure log2_max_frame_num_minus4 like: mpp_enc_cfg_set_u32(cfg, "h264:log2_max_frm_num", xxx); The log2_max_poc_lsb_minus4 configuration is the same.

The values of user can config are defined in file: https://github.com/rockchip-linux/mpp/blob/develop/mpp/base/mpp_enc_cfg.cpp

    ENTRY(h264, log2_max_poc_lsb,   U32, RK_U32,        MPP_ENC_H264_CFG_CHANGE_MAX_POC_LSB,    codec.h264, log2_max_poc_lsb) \
    ENTRY(h264, log2_max_frm_num,   U32, RK_U32,        MPP_ENC_H264_CFG_CHANGE_MAX_FRM_NUM,    codec.h264, log2_max_frame_num) \

FumasterLin avatar Sep 14 '21 02:09 FumasterLin

I was able to change both log2_max_frame_num_minus4 and log2_max_poc_lsb_minus4 to 1. However, it didn't change anything regarding decoding latency. (To expect, but one can try).

Our encoder generates a standard stream, and the decoder decodes according to the standard protocol. If our decoder decode stream time < 10ms, other decoder devices decode time > 50ms. It should be the devices problem, like performance not enough.You can use more other devices for your test.

It is not as simple. We have tested a wide variety of (qualcomm snapdragon) devices, and they all share this "flaw" that streams with poc_type=0 cannot be decoded with low latency (e.g. >1 frame, ~5 frames are buffered). If you feed them a 1080p stream with poc_type=2 they are capable of decoding the same stream with low latency. See https://github.com/google/ExoPlayer/issues/8514

In this sense, qualcomm decoders don't adhere exactly to the h264 specificitaions, but they make >90% of all mobile devices, so one kinda has to play along if you want low latency (get a encoder who is capable of poc_type=2).

Can you look into the hw spec sheet if one can change the poc_type by register on rv1126 ?

Consti10 avatar Sep 14 '21 14:09 Consti10

I was able to change both log2_max_frame_num_minus4 and log2_max_poc_lsb_minus4 to 1. However, it didn't change anything regarding decoding latency. (To expect, but one can try).

I think these two configurations have nothing to do with low-latency decoding

You can apply the diff in the attachments and test poc_type=2 configure.

poc_type2.zip

FumasterLin avatar Sep 15 '21 03:09 FumasterLin

Thanks, I'l try it out as soon as possible.

What exactly does this patch do ? To me it looks like you've copied some code from hal_h264e_vepu_v2.h to rkenc hal ?

Consti10 avatar Sep 15 '21 11:09 Consti10

As discussed before, the rv1126 encoder does not support poc_type=0. So you want to support it, only modify the slice header by software after hw encode done.

FumasterLin avatar Sep 15 '21 11:09 FumasterLin

Cool, I am surprised this doesn't require "more" sw code. What's the reason you wrote this part of code for the (older) vepu units ? Do they also only support setting poc_type==2 by software ?

Consti10 avatar Sep 15 '21 12:09 Consti10

Hi, I've tried out your patch. Analyzing the bit stream, poc_type is succesfully set to 2, but the stream is still not decodable. Here is how it looks with "ammend_patch" and poc_type=2 Screenshot from 2021-09-17 11-02-04

And here how it should look like: (poc_type=0) Screenshot from 2021-09-17 11-03-14

Files: poc_2_and_poc_0.zip

tx1: poc_type=2 tx2: poc_type=0 (not setting poc_type).

Consti10 avatar Sep 17 '21 09:09 Consti10

I've noticed that there is one more "ammend" method unused in your diff: h264e_vepu_stream_amend_sync_ref_idc(amend);

Did you forget that in your code ? Or is it not needed.

Consti10 avatar Sep 17 '21 09:09 Consti10

My test with poc_type=2 is OK. You can use mpi_enc_test and librockchip_mpp.so instead of yours and test again.

test useage: mpi_enc_test -w <with> -h <height> -i <input.yuv> -t 7 -o <output.h264> -n <num> -f <yuv_format> (e.g: mpi_enc_test -w 1920 -h 1080 -t 7 -i 420sp.yuv -o 1920x1080.h264 -n 10 -f 0) so_test.zip e

FumasterLin avatar Sep 17 '21 10:09 FumasterLin

I've been able to confirm that this sw "workaround" regarding re-writing poc_type works and has the expected effect on my pixel 3 (for example). With "ammend workaround" and poc_type set to 2, the decoding time is much lower than with poc_type=0.

The imperfections were gone once I added "h264e_vepu_stream_amend_sync_ref_idc(amend);" to your original diff.

Thanks a lot ! Are you going to merge this into the main branch ?

Consti10 avatar Sep 30 '21 16:09 Consti10

Do you mean that adding h264e_vepu_stream_amend_sync_ref_idc to my diff will not cause problems? Otherwise, there will still be problems?

Could you show me the final diff?

FumasterLin avatar Oct 11 '21 02:10 FumasterLin

In the end, I just copied the "ammend" code into a header file: https://github.com/Consti10/merged/blob/master/external/mpp/mpp/hal/rkenc/h264e/hal_h264e_vepu541_ammend.h

prefixed it with "X" to have no compilation errors.

And completed the code by using your diff and the original implementation: https://github.com/Consti10/merged/blob/master/external/mpp/mpp/hal/rkenc/h264e/hal_h264e_vepu541.c

Consti10 avatar Oct 11 '21 23:10 Consti10

Adding this part: }else if(amend->prefix){ amend->old_length = length; Xh264e_vepu_stream_amend_sync_ref_idc(amend); }

To your original diff should do the trick. You probably forgot to include it in the diff.

Consti10 avatar Oct 11 '21 23:10 Consti10

To your original diff should do the trick. You probably forgot to include it in the diff.

According to my understanding, this modification should be unnecessary. If user config poc_type = 2, the condition that "amend->enable" will be always true. Never go to the condition that "amend->prefix".

Could you double confim this again?

FumasterLin avatar Oct 12 '21 01:10 FumasterLin

I will double confirm, and create a pull request once done.

Consti10 avatar Nov 17 '21 10:11 Consti10