webrtc
webrtc copied to clipboard
H264 depacketization fails with `ErrShortPacket`
I'm getting Rtp(ErrShortPacket) after several successful depacketize calls from the H264Packete depacketizer, and am struggling to understand what I'm doing wrong.
The minimal reproduction of this is:
use std::sync::Arc;
use std::time::Duration;
use tokio::io::{AsyncBufReadExt, BufReader};
use tracing::{debug, error, info};
use tracing_subscriber::fmt;
use tracing_subscriber::layer::SubscriberExt;
use webrtc::api::APIBuilder;
use webrtc::api::interceptor_registry::register_default_interceptors;
use webrtc::api::media_engine::{MediaEngine, MIME_TYPE_H264};
use webrtc::ice_transport::ice_server::RTCIceServer;
use webrtc::interceptor::registry::Registry;
use webrtc::peer_connection::configuration::RTCConfiguration;
use webrtc::peer_connection::RTCPeerConnection;
use webrtc::peer_connection::sdp::session_description::RTCSessionDescription;
use webrtc::rtp;
use webrtc::rtp_transceiver::rtp_codec::{RTCRtpCodecCapability, RTCRtpCodecParameters, RTPCodecType};
use webrtc::rtp_transceiver::rtp_receiver::RTCRtpReceiver;
use webrtc::track::track_remote::TrackRemote;
use crate::rtp::codecs::h264::H264Packet;
use crate::rtp::packetizer::Depacketizer;
#[tokio::main()]
async fn main() {
let subscriber = tracing_subscriber::registry()
.with(fmt::Layer::new().with_writer(std::io::stdout).pretty());
tracing::subscriber::set_global_default(subscriber).expect("Unable to set a global collector");
println!("Enter base64 sdp: ");
let stdin = tokio::io::stdin();
let reader = BufReader::new(stdin);
let line = reader.lines().next_line().await.unwrap().unwrap();
let bytes = base64::decode(line).unwrap();
let json = String::from_utf8(bytes).unwrap();
let offer = serde_json::from_str::<RTCSessionDescription>(&json).unwrap();
let _connection = create_connection(offer).await;
loop {
tokio::time::sleep(Duration::from_secs(1)).await;
}
}
async fn receive_rtp_track_media(track: Arc<TrackRemote>) {
info!("Starting rtp track reader");
let track_codec = track.codec().await;
let mime_type = track_codec.capability.mime_type.to_lowercase();
info!("New RTP track started with mime type '{}'", mime_type);
if mime_type != MIME_TYPE_H264.to_lowercase() {
error!("Invalid mime type");
return;
}
let mut has_seen_key_frame = false;
let mut cached_h264_packet = H264Packet::default();
while let Ok((rtp_packet, _)) = track.read_rtp().await {
match handle_h264_packet(rtp_packet, &mut has_seen_key_frame, &mut cached_h264_packet) {
Ok(()) => (),
Err(error) => {
error!("Failed to process rtp packet: {:?}", error);
break;
}
}
}
info!("Stopping rtp track reader");
}
fn handle_h264_packet(
rtp_packet: rtp::packet::Packet,
has_seen_key_frame: &mut bool,
cached_h264_packet: &mut H264Packet,
) -> Result<(), webrtc::Error> {
if rtp_packet.payload.is_empty() {
return Ok(());
}
let is_key_frame = is_key_frame(&rtp_packet.payload);
if !*has_seen_key_frame && !is_key_frame {
return Ok(());
}
*has_seen_key_frame = true;
let payload = cached_h264_packet.depacketize(&rtp_packet.payload)?;
if !payload.is_empty() {
debug!("H264 packet depacketized");
}
Ok(())
}
async fn create_connection(offer: RTCSessionDescription) -> RTCPeerConnection {
let mut media_engine = MediaEngine::default();
media_engine.register_codec(
RTCRtpCodecParameters {
capability: RTCRtpCodecCapability {
mime_type: MIME_TYPE_H264.to_owned(),
clock_rate: 90000,
channels: 0,
sdp_fmtp_line: "".to_owned(),
rtcp_feedback: vec![],
},
payload_type: 102,
..Default::default()
},
RTPCodecType::Video,
).expect("Failed to add h264 to media engine");
let registry = Registry::new();
let registry = register_default_interceptors(registry, &mut media_engine).unwrap();
let api = APIBuilder::new()
.with_media_engine(media_engine)
.with_interceptor_registry(registry)
.build();
let config = RTCConfiguration {
ice_servers: vec![RTCIceServer {
urls: vec!["stun:stun.l.google.com:19302".to_owned()],
..Default::default()
}],
..Default::default()
};
let peer_connection = api.new_peer_connection(config).await.unwrap();
peer_connection.add_transceiver_from_kind(RTPCodecType::Video, &[]).await.unwrap();
peer_connection.on_track(
Box::new(move |track: Option<Arc<TrackRemote>>, _receiver: Option<Arc<RTCRtpReceiver>>| {
if let Some(track) = track {
tokio::spawn(receive_rtp_track_media(track));
}
Box::pin(async {})
})
).await;
peer_connection.set_remote_description(offer).await.unwrap();
let answer = peer_connection.create_answer(None).await.unwrap();
let mut channel = peer_connection.gathering_complete_promise().await;
peer_connection.set_local_description(answer).await.unwrap();
let _ = channel.recv().await;
let answer = peer_connection.local_description().await.unwrap();
let json = serde_json::to_string(&answer).unwrap();
let encoded_json = base64::encode(json);
debug!("Answer sdp of: {}", encoded_json);
peer_connection
}
const NALU_TTYPE_STAP_A: u32 = 24;
const NALU_TTYPE_SPS: u32 = 7;
const NALU_TYPE_BITMASK: u32 = 0x1F;
fn is_key_frame(data: &[u8]) -> bool {
if data.len() < 4 {
false
} else {
let word = u32::from_be_bytes([data[0], data[1], data[2], data[3]]);
let nalu_type = (word >> 24) & NALU_TYPE_BITMASK;
(nalu_type == NALU_TTYPE_STAP_A && (word & NALU_TYPE_BITMASK) == NALU_TTYPE_SPS)
|| (nalu_type == NALU_TTYPE_SPS)
}
}
The Cargo.Toml this works with is
[package]
name = "webrtc-test"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
bytes = "1"
tokio = { version = "1.9", features = ["full"] }
futures = "0.3"
tracing = "0.1"
tracing-subscriber = { version = "0.3.2", features = ["json"] }
tracing-futures = "0.2.5"
tracing-appender = "0.2.0"
serde_json = "1.0.79"
webrtc-sdp = "0.3.9"
webrtc = "0.4.0"
base64 = "0.13.0"
To test this I
- Execute this code
- Point my browser to the jsfiddle from the broadcast example
- Click Publish a broadcast
- Copy the base64 SDP into the example program's console window
- Copy the base64 SDP response from the example program into the browser and click Start session
The jsfiddle log area shows that it's connected, and I have the following logs from the example application
2022-03-05T18:22:07.925327Z INFO webrtc_test: Starting rtp track reader
at src\main.rs:44
2022-03-05T18:22:07.925377Z INFO webrtc_test: New RTP track started with mime type 'video/h264'
at src\main.rs:48
2022-03-05T18:22:07.925416Z DEBUG webrtc_test: H264 packet depacketized
at src\main.rs:88
2022-03-05T18:22:08.018040Z DEBUG webrtc_test: H264 packet depacketized
at src\main.rs:88
2022-03-05T18:22:08.033860Z DEBUG webrtc_test: H264 packet depacketized
at src\main.rs:88
2022-03-05T18:22:08.034388Z DEBUG webrtc_test: H264 packet depacketized
at src\main.rs:88
2022-03-05T18:22:08.034461Z DEBUG webrtc_test: H264 packet depacketized
at src\main.rs:88
2022-03-05T18:22:08.049109Z DEBUG webrtc_test: H264 packet depacketized
at src\main.rs:88
2022-03-05T18:22:08.096063Z DEBUG webrtc_test: H264 packet depacketized
at src\main.rs:88
2022-03-05T18:22:08.127287Z DEBUG webrtc_test: H264 packet depacketized
at src\main.rs:88
2022-03-05T18:22:08.159258Z DEBUG webrtc_test: H264 packet depacketized
at src\main.rs:88
2022-03-05T18:22:08.191205Z DEBUG webrtc_test: H264 packet depacketized
at src\main.rs:88
2022-03-05T18:22:08.222239Z DEBUG webrtc_test: H264 packet depacketized
at src\main.rs:88
2022-03-05T18:22:08.253229Z DEBUG webrtc_test: H264 packet depacketized
at src\main.rs:88
2022-03-05T18:22:08.285444Z DEBUG webrtc_test: H264 packet depacketized
at src\main.rs:88
2022-03-05T18:22:08.315680Z ERROR webrtc_test: Failed to process rtp packet: Rtp(ErrShortPacket)
at src\main.rs:61
2022-03-05T18:22:08.315715Z INFO webrtc_test: Stopping rtp track reader
at src\main.rs:67
This is showing that it's connected, we depacketize 13 h264 packets in this example before receiving a 2 byte packet with the bytes [9, 48] every time.
This is reproducible 100% of the time for me. What am I doing wrong? Following the save h264 example it does not seem like you are special casing this error. The logic in my handle_h264_packet seems to also match up with the H264Writer logic you are using, unless I am missing something.
Well this is interesting. I can reproduce this 100% on my desktop but not on my laptop. Desktop is using an external webcam while the laptop is using the built in webcam, but other then that both are using Firefox on Windows 10.
You aren't resequencing the stream. Consider what happens with your code if a FU-A split packets arrives out of order.
Also I've observed that at least Chrome sends packets with empty payload lengths
You aren't resequencing the stream. Consider what happens with your code if a FU-A split packets arrives out of order.
Can you expand on how resequencing is relevant to my minimal reproduction? I would expect resequencing to happen either at the read_rtp() calls or by the H264Packet.depacketize() functionality, both are aspects built directly into webrtc-rs. If neither are these are meant to handle out of sequence rtp packets by design, then doesn't that make the H264Packet functionality essentially useless, (since that's already meant to turn many FU-A split packets into a single NAL unit afaict)?
Likewise, in the Webrtc-rs situation it's now more unclear at what layer (and who is responsible for) packets are re-sequenced.
Also I've observed that at least Chrome sends packets with empty payload lengths
This does seem like it could be what I'm seeing. After I had written this bug report I started delving into h264 nal unit parsing a bit, and it seems like the logic in H264packet.depacketize() is incorrect in that it expects every h264 packet to be 4 bytes. That being said, I don't feel confident in my shallow h264 knowledge to change that, as I assume there's a reason it currently requires at least 4 bytes?
Can you expand on how resequencing is relevant to my minimal reproduction?
There is not resequencing in webrtc-rs itself as far as I know. This still lets it support many use cases where RTP packets simply pass through webrtc-rs, but for task such as parsing h264 resequencing is required.
The H264Packet functionality isn't useless, it still keeps the buffer around and your code looks fine(under the assumption that packets are ordered).
That being said, I don't feel confident in my shallow h264 knowledge to change that, as I assume there's a reason it currently requires at least 4 bytes?
I'm not sure where you are getting 4 bytes from. To me it looks like it expects the RTP payload to be at least 2 bytes, the first of which will be NALU unit octet
+---------------+
|0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+
|F|NRI| Type |
+---------------+
I assume it checks for a length of two because there are no NALUs that are entirely empty
I'm not sure where you are getting 4 bytes from. To me it looks like it expects the RTP payload to be at least 2 bytes, the first of which will be NALU unit octet
I mis-remembered, but you are almost correct. The code at https://github.com/webrtc-rs/rtp/blob/2cce83c5618a646df22b9213c040470be9fb803a/src/codecs/h264/mod.rs#L211 wants it to be greater than 2 bytes, not at least 2 bytes. Since I'm getting a 2 byte packet, it's returning a packet too short error. It's not clear to me if a 2 packet h264 NAL is valid or not (my assumption is that it is and the h264 depacketizer accidentally did <= instead of <, but I'm not confident enough about h264 to know for sure).
The
H264Packetfunctionality isn't useless, it still keeps the buffer around and your code looks fine(under the assumption that packets are ordered).
Ok that's helpful. So in a real scenario my handle_h264_packet() needs to make sure packets are in the correct order and that we aren't missing any before calling H264Packet.depacketize() across all of them. That's not clear from the examples, or even other non-example code. My handle_h264_packet() was derived from the built in H264Writer which doesn't do any re-ordering or even missing packet handling.
Ah, you are indeed correct about it being greater than 2 bytes.
This condition is the same one as Pion(which webrtc-rs is a port of) so I think it's correct.
Out of curiosity do you have examples of the 2 byte payloads you are seeing?
That's not clear from the examples, or even other non-example code
Yeah agreed, the examples are a bit incomplete in this sense, but my understanding is definitely that there's no re-sequencing happening anywhere and for a production grade implementation that needs to be added to do things like writing to disk.
it still keeps the buffer around and your code looks fine(under the assumption that packets are ordered).
Actually, you need to reset the packet after parsing an FU-A completely too.
Maybe @rainliu or @Sean-Der would be interested in weighing in?
Out of curiosity do you have examples of the 2 byte payloads you are seeing?
I believe the 2 byte payload I kept seeing was [9, 48] consistently based on my previous notes, but if I get a chance later this evening I'll try running it again.
Ah, that's an access unit delimiter. I guess in theory it's okay for those to be dropped, but maybe the condition should be changed to allow them through.
pion related issue https://github.com/pion/rtp/pull/213