rust-libp2p icon indicating copy to clipboard operation
rust-libp2p copied to clipboard

Bug: Arithmetic overflow in mDNS implementation when adding TTL to Instant

Open momoshell opened this issue 9 months ago • 7 comments

Summary

Description

On the Avail Light Client network, we've identified a potential panic in the mDNS implementation when processing response packets with extremely large TTL values. The issue occurs due to an arithmetic overflow when adding a TTL duration to an Instant.

Root Cause

The issue was identified in the extract_discovered method in the query.rs file:

let new_expiration = now + peer.ttl();

Expected behavior

The quoted vulnerable line above should be able to cap the TTL to a reasonable maximum value [say 24 hours].

Actual behavior

This unchecked arithmetic panics when an mDNS packet contains an unusually large TTL value, possibly due to receiving malformed or malicious mDNS packets.

Relevant log output

Error: The application panicked (crashed).
Message:  overflow when adding duration to instant
Location: library/std/src/time.rs:417

  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ BACKTRACE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
                                ⋮ 6 frames hidden ⋮                               
   7: core::option::expect_failed::h8c40d2d654ba3611
      at <unknown source file>:<unknown line>
   8: tokio::time::interval::Interval::poll_tick::hc1814dc20e3605e4
      at <unknown source file>:<unknown line>
   9: <libp2p_mdns::behaviour::Behaviour<P> as libp2p_swarm::behaviour::NetworkBehaviour>::poll::ha2bef1b1e29f0bbc

Possible Solution

Replace the vulnerable line with a safer implementation that caps the TTL to a reasonable maximum value:

// Cap TTL to a reasonable maximum (e.g., 24 hours)
let capped_ttl = std::cmp::min(peer.ttl(), Duration::from_secs(86400)); // 86400 seconds = 24 hours
let new_expiration = now + capped_ttl;

Another approach could be to use checked_add with a fallback:

let new_expiration = now.checked_add(peer.ttl())
    .unwrap_or_else(|| now + Duration::from_secs(3600)); // Fallback to 1 hour if overflow

Version

0.55.0

Would you like to work on fixing this bug?

Yes

momoshell avatar Mar 21 '25 19:03 momoshell

Hey! Nice catch on the possible overflow. While I am not against having it fixed to a specific value if it does overflow, I am curious on if we should also have this apart of libp2p spec as well so it would possibly be more consistent in such cases where there is a overflow.

dariusc93 avatar Mar 21 '25 23:03 dariusc93

If that overflow is due to a malformed or malicious packet do we even want to handle the content? IMO it's also reasonable to just filter out a peer in a response if the ttl is invalid.

elenaf9 avatar Mar 23 '25 05:03 elenaf9

Hey guys, thanks for the prompt responses, really appreciate it. I agree, the extract_discovered function already does the filtering and seems like the perfect place to add a filter combinator.

Without defining fallback TTL values to compare to, would something like this be a suitable change for the response handling?

pub(crate) fn extract_discovered(
    &self,
    now: Instant,
    local_peer_id: PeerId,
) -> impl Iterator<Item = (PeerId, Multiaddr, Instant)> + '_ {
    self.discovered_peers()
        .filter(move |peer| peer.id() != &local_peer_id)
        // Filter out peers whose TTL would cause overflow
        .filter(move |peer| now.checked_add(peer.ttl()).is_some())
        .flat_map(move |peer| {
            let observed = self.observed_address();
            // Safe to use direct addition since we've filtered out overflow cases
            let new_expiration = now + peer.ttl();

            peer.addresses().iter().filter_map(move |address| {
                let new_addr = _address_translation(address, &observed)?;
                let new_addr = new_addr.with_p2p(*peer.id()).ok()?;

                Some((*peer.id(), new_addr, new_expiration))
            })
        })
}

momoshell avatar Mar 23 '25 09:03 momoshell

Yes, that's what I was thinking too. I think we can simply do:

self.discovered_peers()
        .filter_map(|peer| {
            if peer.id() != &local_peer_id {
                return None;
            }
            let new_expiration = now.checked_add(peer.ttl())?;
            Some((peer, new_expiration))
        }
        .flat_map(|(peer, new_expiration)| {
             let observed = self.observed_address();
             ...
        }

elenaf9 avatar Mar 26 '25 08:03 elenaf9

Hey hey, the initial changes were meant to keep things minimal, but I’m totally on board with simplifying the code. Just a heads up though, the latest one has a pretty critical logic error. If I'm correct the difference being is:

  1. In the INCORRECT version:
  • if peer.id() != &local_peer_id { return None; }
  • This means "if the peer is NOT the local peer, return None"
  • This is BACKWARDS from the intent
  • It would KEEP the local peer and DISCARD all other peers
  • The exact opposite of what we want!
  1. In the CORRECT version:
  • .filter(|peer| peer.id() != &local_peer_id)
  • This explicitly removes the local peer
  • Keeps all peers EXCEPT the local peer
  • This matches the original function's logic

maan, I hate negations ! 🗡

Looking at the possible filter_map combinations now, the one with just the added filter feels like the cleanest and easiest to follow.

momoshell avatar Mar 26 '25 08:03 momoshell

Ah yes, that was a typo, I meant if peer.id() == &local_peer_id { return None; }.

Anyway, let's discuss this on the PR :)

elenaf9 avatar Mar 26 '25 09:03 elenaf9

Awesome, really appreciate all your help guys. Will create a PR shortly

momoshell avatar Mar 26 '25 09:03 momoshell

Reopening the issue just to track properly how/ when it is resolved.

elenaf9 avatar Mar 26 '25 09:03 elenaf9