Bug: Arithmetic overflow in mDNS implementation when adding TTL to Instant
Summary
Description
On the Avail Light Client network, we've identified a potential panic in the mDNS implementation when processing response packets with extremely large TTL values. The issue occurs due to an arithmetic overflow when adding a TTL duration to an Instant.
Root Cause
The issue was identified in the extract_discovered method in the query.rs file:
let new_expiration = now + peer.ttl();
Expected behavior
The quoted vulnerable line above should be able to cap the TTL to a reasonable maximum value [say 24 hours].
Actual behavior
This unchecked arithmetic panics when an mDNS packet contains an unusually large TTL value, possibly due to receiving malformed or malicious mDNS packets.
Relevant log output
Error: The application panicked (crashed).
Message: overflow when adding duration to instant
Location: library/std/src/time.rs:417
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ BACKTRACE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⋮ 6 frames hidden ⋮
7: core::option::expect_failed::h8c40d2d654ba3611
at <unknown source file>:<unknown line>
8: tokio::time::interval::Interval::poll_tick::hc1814dc20e3605e4
at <unknown source file>:<unknown line>
9: <libp2p_mdns::behaviour::Behaviour<P> as libp2p_swarm::behaviour::NetworkBehaviour>::poll::ha2bef1b1e29f0bbc
Possible Solution
Replace the vulnerable line with a safer implementation that caps the TTL to a reasonable maximum value:
// Cap TTL to a reasonable maximum (e.g., 24 hours)
let capped_ttl = std::cmp::min(peer.ttl(), Duration::from_secs(86400)); // 86400 seconds = 24 hours
let new_expiration = now + capped_ttl;
Another approach could be to use checked_add with a fallback:
let new_expiration = now.checked_add(peer.ttl())
.unwrap_or_else(|| now + Duration::from_secs(3600)); // Fallback to 1 hour if overflow
Version
0.55.0
Would you like to work on fixing this bug?
Yes
Hey! Nice catch on the possible overflow. While I am not against having it fixed to a specific value if it does overflow, I am curious on if we should also have this apart of libp2p spec as well so it would possibly be more consistent in such cases where there is a overflow.
If that overflow is due to a malformed or malicious packet do we even want to handle the content? IMO it's also reasonable to just filter out a peer in a response if the ttl is invalid.
Hey guys, thanks for the prompt responses, really appreciate it.
I agree, the extract_discovered function already does the filtering and seems like the perfect place to add a filter combinator.
Without defining fallback TTL values to compare to, would something like this be a suitable change for the response handling?
pub(crate) fn extract_discovered(
&self,
now: Instant,
local_peer_id: PeerId,
) -> impl Iterator<Item = (PeerId, Multiaddr, Instant)> + '_ {
self.discovered_peers()
.filter(move |peer| peer.id() != &local_peer_id)
// Filter out peers whose TTL would cause overflow
.filter(move |peer| now.checked_add(peer.ttl()).is_some())
.flat_map(move |peer| {
let observed = self.observed_address();
// Safe to use direct addition since we've filtered out overflow cases
let new_expiration = now + peer.ttl();
peer.addresses().iter().filter_map(move |address| {
let new_addr = _address_translation(address, &observed)?;
let new_addr = new_addr.with_p2p(*peer.id()).ok()?;
Some((*peer.id(), new_addr, new_expiration))
})
})
}
Yes, that's what I was thinking too. I think we can simply do:
self.discovered_peers()
.filter_map(|peer| {
if peer.id() != &local_peer_id {
return None;
}
let new_expiration = now.checked_add(peer.ttl())?;
Some((peer, new_expiration))
}
.flat_map(|(peer, new_expiration)| {
let observed = self.observed_address();
...
}
Hey hey, the initial changes were meant to keep things minimal, but I’m totally on board with simplifying the code. Just a heads up though, the latest one has a pretty critical logic error. If I'm correct the difference being is:
- In the INCORRECT version:
if peer.id() != &local_peer_id { return None; }- This means "if the peer is NOT the local peer, return None"
- This is BACKWARDS from the intent
- It would KEEP the local peer and DISCARD all other peers
- The exact opposite of what we want!
- In the CORRECT version:
.filter(|peer| peer.id() != &local_peer_id)- This explicitly removes the local peer
- Keeps all peers EXCEPT the local peer
- This matches the original function's logic
maan, I hate negations ! 🗡
Looking at the possible filter_map combinations now, the one with just the added filter feels like the cleanest and easiest to follow.
Ah yes, that was a typo, I meant if peer.id() == &local_peer_id { return None; }.
Anyway, let's discuss this on the PR :)
Awesome, really appreciate all your help guys. Will create a PR shortly
Reopening the issue just to track properly how/ when it is resolved.