sipsorcery
sipsorcery copied to clipboard
Add H264 depacketisation
Currently the RTPSession
class supports H264
packetisation but not depacketisation. That leaves VP8
as the only fully supported video codec. This issue is to capture the need to add the feature. Ideally H264
packetisation logic should be refactored out of RTPSession
into a separate class at the same time.
@sipsorcery You can use my payload processor to do it.
Just call ProcessRTPPayload and if return != NULL, the frame is completed (Support Annex-b Slices). Just to remember, this processor does not handle packages with different timestamp as it is not a JitterVideoBuffer inself.
Ex: The frame with timestamp 100 will be lost because the Payloader will clean the buffer when receive a new timestamp (200).
seqNum 1 timestamp 100 markbit 0 seqNum 3 timestamp 100 markbit 1 seqNum 4 timestamp 200 markbit 1 seqNum 2 timestamp 100 markbit 0
Ex2: if slices of same timestamp was received in incorrect order, the payload processor can sort it so the result below is a valid H264 Frame.
seqNum 1 timestamp 100 markbit 0 seqNum 3 timestamp 100 markbit 1 seqNum 2 timestamp 100 markbit 0 seqNum 4 timestamp 200 markbit 1
/// <summary>
/// Based in https://github.com/BogdanovKirill/RtspClientSharp/blob/master/RtspClientSharp/MediaParsers/H264VideoPayloadParser.cs
/// Distributed under MIT License
///
/// @author [email protected]
/// </summary>
using System;
using System.Collections.Generic;
using System.IO;
namespace SIPSorcery
{
public class H264PayloadProcessor
{
#region Consts
const int SPS = 7;
const int PPS = 8;
const int IDR_SLICE = 1;
const int NON_IDR_SLICE = 5;
#endregion
#region Private Variables
//Payload Helper Fields
uint previous_timestamp = 0;
int norm, fu_a, fu_b, stap_a, stap_b, mtap16, mtap24 = 0; // used for diagnostics stats
List<KeyValuePair<int, byte[]>> temporary_rtp_payloads = new List<KeyValuePair<int, byte[]>>(); // used to assemble the RTP packets that form one RTP Frame
MemoryStream fragmented_nal = new MemoryStream(); // used to concatenate fragmented H264 NALs where NALs are splitted over RTP packets
#endregion
#region Public Functions
public virtual MemoryStream ProcessRTPPayload(byte[] rtpPayload, ushort seqNum, uint timestamp, int markbit, out bool isKeyFrame)
{
List<byte[]> nal_units = ProcessRTPPayloadAsNals(rtpPayload, seqNum, timestamp, markbit, out isKeyFrame);
if (nal_units != null)
{
//Calculate total buffer size
long totalBufferSize = 0;
for (int i = 0; i < nal_units.Count; i++)
{
var nal = nal_units[i];
long remaining = nal.Length;
if (remaining > 0)
totalBufferSize += remaining + 4; //nal + 0001
else
{
nal_units.RemoveAt(i);
i--;
}
}
//Merge nals in same buffer using Annex-B separator (0001)
MemoryStream data = new MemoryStream(new byte[totalBufferSize]);
foreach (var nal in nal_units)
{
data.WriteByte(0);
data.WriteByte(0);
data.WriteByte(0);
data.WriteByte(1);
data.Write(nal, 0, nal.Length);
}
return data;
}
return null;
}
public virtual List<byte[]> ProcessRTPPayloadAsNals(byte[] rtpPayload, ushort seqNum, uint timestamp, int markbit, out bool isKeyFrame)
{
List<byte[]> nal_units = ProcessH264Payload(rtpPayload, seqNum, timestamp, markbit, out isKeyFrame);
return nal_units;
}
#endregion
#region Payload Internal Functions
protected virtual List<byte[]> ProcessH264Payload(byte[] rtp_payload, ushort seqNum, uint rtp_timestamp, int rtp_marker, out bool isKeyFrame)
{
if (previous_timestamp != rtp_timestamp && previous_timestamp > 0)
{
temporary_rtp_payloads.Clear();
previous_timestamp = 0;
fragmented_nal.SetLength(0);
}
// Add to the list of payloads for the current Frame of video
temporary_rtp_payloads.Add(new KeyValuePair<int, byte[]>(seqNum, rtp_payload)); // TODO could optimise this and go direct to Process Frame if just 1 packet in frame
if (rtp_marker == 1)
{
//Reorder to prevent UDP incorrect package order
if (temporary_rtp_payloads.Count > 1)
temporary_rtp_payloads.Sort((a, b) => { return a.Key.CompareTo(b.Key); });
// End Marker is set. Process the list of RTP Packets (forming 1 RTP frame) and save the NALs to a file
List<byte[]> nal_units = ProcessH264PayloadFrame(temporary_rtp_payloads, out isKeyFrame);
temporary_rtp_payloads.Clear();
previous_timestamp = 0;
fragmented_nal.SetLength(0);
return nal_units;
}
else
{
isKeyFrame = false;
previous_timestamp = rtp_timestamp;
return null; // we don't have a frame yet. Keep accumulating RTP packets
}
}
// Process a RTP Frame. A RTP Frame can consist of several RTP Packets which have the same Timestamp
// Returns a list of NAL Units (with no 00 00 00 01 header and with no Size header)
protected virtual List<byte[]> ProcessH264PayloadFrame(List<KeyValuePair<int, byte[]>> rtp_payloads, out bool isKeyFrame)
{
bool? isKeyFrameNullable = null;
List<byte[]> nal_units = new List<byte[]>(); // Stores the NAL units for a Video Frame. May be more than one NAL unit in a video frame.
for (int payload_index = 0; payload_index < rtp_payloads.Count; payload_index++)
{
// Examine the first rtp_payload and the first byte (the NAL header)
int nal_header_f_bit = (rtp_payloads[payload_index].Value[0] >> 7) & 0x01;
int nal_header_nri = (rtp_payloads[payload_index].Value[0] >> 5) & 0x03;
int nal_header_type = (rtp_payloads[payload_index].Value[0] >> 0) & 0x1F;
// If the Nal Header Type is in the range 1..23 this is a normal NAL (not fragmented)
// So write the NAL to the file
if (nal_header_type >= 1 && nal_header_type <= 23)
{
norm++;
//Check if is Key Frame
CheckKeyFrame(nal_header_type, ref isKeyFrameNullable);
nal_units.Add(rtp_payloads[payload_index].Value);
}
// There are 4 types of Aggregation Packet (split over RTP payloads)
else if (nal_header_type == 24)
{
stap_a++;
// RTP packet contains multiple NALs, each with a 16 bit header
// Read 16 byte size
// Read NAL
try
{
int ptr = 1; // start after the nal_header_type which was '24'
// if we have at least 2 more bytes (the 16 bit size) then consume more data
while (ptr + 2 < (rtp_payloads[payload_index].Value.Length - 1))
{
int size = (rtp_payloads[payload_index].Value[ptr] << 8) + (rtp_payloads[payload_index].Value[ptr + 1] << 0);
ptr = ptr + 2;
byte[] nal = new byte[size];
Buffer.BlockCopy(rtp_payloads[payload_index].Value, ptr, nal, 0, size); // copy the NAL
byte reconstructed_nal_type = (byte)((nal[0] >> 0) & 0x1F);
//Check if is Key Frame
CheckKeyFrame(reconstructed_nal_type, ref isKeyFrameNullable);
nal_units.Add(nal); // Add to list of NALs for this RTP frame. Start Codes like 00 00 00 01 get added later
ptr = ptr + size;
}
}
catch
{
}
}
else if (nal_header_type == 25)
{
stap_b++;
}
else if (nal_header_type == 26)
{
mtap16++;
}
else if (nal_header_type == 27)
{
mtap24++;
}
else if (nal_header_type == 28)
{
fu_a++;
// Parse Fragmentation Unit Header
int fu_indicator = rtp_payloads[payload_index].Value[0];
int fu_header_s = (rtp_payloads[payload_index].Value[1] >> 7) & 0x01; // start marker
int fu_header_e = (rtp_payloads[payload_index].Value[1] >> 6) & 0x01; // end marker
int fu_header_r = (rtp_payloads[payload_index].Value[1] >> 5) & 0x01; // reserved. should be 0
int fu_header_type = (rtp_payloads[payload_index].Value[1] >> 0) & 0x1F; // Original NAL unit header
// Check Start and End flags
if (fu_header_s == 1 && fu_header_e == 0)
{
// Start of Fragment.
// Initialise the fragmented_nal byte array
// Build the NAL header with the original F and NRI flags but use the the Type field from the fu_header_type
byte reconstructed_nal_type = (byte)((nal_header_f_bit << 7) + (nal_header_nri << 5) + fu_header_type);
// Empty the stream
fragmented_nal.SetLength(0);
// Add reconstructed_nal_type byte to the memory stream
fragmented_nal.WriteByte((byte)reconstructed_nal_type);
// copy the rest of the RTP payload to the memory stream
fragmented_nal.Write(rtp_payloads[payload_index].Value, 2, rtp_payloads[payload_index].Value.Length - 2);
}
if (fu_header_s == 0 && fu_header_e == 0)
{
// Middle part of Fragment
// Append this payload to the fragmented_nal
// Data starts after the NAL Unit Type byte and the FU Header byte
fragmented_nal.Write(rtp_payloads[payload_index].Value, 2, rtp_payloads[payload_index].Value.Length - 2);
}
if (fu_header_s == 0 && fu_header_e == 1)
{
// End part of Fragment
// Append this payload to the fragmented_nal
// Data starts after the NAL Unit Type byte and the FU Header byte
fragmented_nal.Write(rtp_payloads[payload_index].Value, 2, rtp_payloads[payload_index].Value.Length - 2);
var fragmeted_nal_array = fragmented_nal.ToArray();
byte reconstructed_nal_type = (byte)((fragmeted_nal_array[0] >> 0) & 0x1F);
//Check if is Key Frame
CheckKeyFrame(reconstructed_nal_type, ref isKeyFrameNullable);
// Add the NAL to the array of NAL units
nal_units.Add(fragmeted_nal_array);
fragmented_nal.SetLength(0);
}
}
else if (nal_header_type == 29)
{
fu_b++;
}
}
isKeyFrame = isKeyFrameNullable != null ? isKeyFrameNullable.Value : false;
// Output all the NALs that form one RTP Frame (one frame of video)
return nal_units;
}
protected void CheckKeyFrame(int nal_type, ref bool? isKeyFrame)
{
if (isKeyFrame == null)
{
isKeyFrame = nal_type == SPS || nal_type == PPS ? new bool?(true) :
(nal_type == NON_IDR_SLICE ? new bool?(false) : null);
}
else
{
isKeyFrame = nal_type == SPS || nal_type == PPS ?
(isKeyFrame.Value ? isKeyFrame : new bool?(false)) :
(nal_type == NON_IDR_SLICE ? new bool?(false) : isKeyFrame);
}
}
#endregion
}
}
Awesome, thanks! I'll see about incorporating that class pronto.
@sipsorcery i wounder if you could add a way to disable this "internal" frame processmentments to prevent duplicity in depackanization when we try to use custom logics.
As i said, this internal implementations don't use any kind of JitterBuffers.... So, if we want to perform our own custom logic to decode a Frame using OnRtpPacketReceived we are doomed to perform this 2 times, as inside RTPSession you always perform Depackanization from packets of a know format (H264/VP8).
Maybe let us decide if we want to perform this internal depackanization checking internally if (OnVideoFrameReceived != null) before call internal processors is a good way to go... what do you think?
Best regards Rafael
I added that change and merged the PR.
The H264 depacketisation now works in some circumstances, which is a big improvement! In other circumstances FFmpeg is unhappy with the encoded frames:
- 480p from Chrome: Works
- 720p from Chrome: Fails
- 480p from MicroSIP (SIP softphone): Fails
In the same situations VP8 works.
The good thing is there is now a starting point to work from.
@sipsorcery i already tested this depackanization logic with 720p in chrome
As i said before you must ensure finish a timestamp before apply another timestamp to H264PayloadProcessor.
if you receive packet in order below and put it directly on H264PayloadProcessor boths frames are lost...
seqNum 1 timestamp 100 markbit 0 seqNum 3 timestamp 100 markbit 1 seqNum 4 timestamp 200 markbit 1 seqNum 2 timestamp 100 markbit 0
KeyFrames can be surprisely long... you can receive more than 50 packets from UDP to handle one keyframe so using H264PayloadProcessor directly without check if payload is already processing another timestamp will cause a huge amount of frames lost.
Another tip is to always use FormatDescription with PackanizationMode == 1 before send SDP with H264 as this depackanization only handle this mode.
Best Regards Rafael
Yep I understand what you mean regarding packet loss and out of sequence arrival. I recently added a log warning message so I can quickly identify when that occurs.
The Chrome 720p failure was my fault. In the FFmpeg logic I wasn't re-creating the pixel converted dimensions when the decoded source frame changed. I fixed that and I'm able to decode H264 at 720p and 1080p.
The MicroSIP softphone is using packetization mode 0 but I don't think that's the issue as that's what Chrome is defaulting to as well (I'm sending the SDP offer without specifying a H264 packetization mode so Chrome uses 0). I'll keep looking into this one,
When using packatization mode == 0 the PPS and SPS (required to decode all frames) was not sended with KeyFrames... Instead, in chrome, the PPS/SPS was only sended in first packet so if you lose it all packages are lost.
In PackatizationMode == 1 the SPS/PPS was sended with KeyFrame so the KeyFrame will have 3 or more Nals (SPS, PPS, (IDR)*)... In this mode you can always recover from SPS/PPS when new KeyFrame arrive.
From my understanding the packetization mode does not influence the H264 byte stream produced by an encoder. Whether or not the byte stream includes additional PPS
and SPS
NALs does not change the RTP packetisation. The RTP layer does not understand the different types of NALs. All it does is split them up and package them into RTP packets for sending on the wire. I've refactored that logic out of RTPSession
now and put it into a dedicated H264Packetiser class.
It also seems like packetisation-mode 0 and 1 are well understood. I tested with two different softphones as well as Chrome and a H264 stream packetised as mode 1 was understood whether or not the parameter was set in the SDP or not.
Could it be an issue related to packet loss ? Just basing my answer as the receiving side is somehow corrupt and the effect is like this one (more or less) -> http://www.mediapro.cc/rtp抗丢包传输方案/
If so.. I think I am on a big issue as this library doesn't have any mechanism for packet loss right ?
There's no packet loss recovery mechanism by now... i started building support for it but it's not finished
@rafcsoares so happy on hearing that, if I can help you somehow count with it! If you've the work on a branch and I could contribute or testing I'll happy on helping :)
I saw on a bit old Microsoft example a full implementation of FEC using XOR and also Reed Solomon, just in case this helps -> https://github.com/conferencexp/conferencexp/blob/master/MSR.LST.Net.Rtp/fec.cs#L47
And Java implementation of ULPFEC (like what's currently in use in WebRTC) https://github.com/jitsi/libjitsi/tree/master/src/main/java/org/jitsi/impl/neomedia/transform/fec
There's no packet loss recovery mechanism by now... i started building support for it but it's not finished
is there any packet loss rec available now? What to do if udp h264 packets drop and decoder gives error when expected timestamp/sequence frame is lost?