object icon indicating copy to clipboard operation
object copied to clipboard

DyldCache: add the ability to iterate mappings and relocations

Open scollinson opened this issue 1 year ago • 11 comments

Add mapping and relocation iterators to the DyldCache class that handle modern dyld shared cache formats.

scollinson avatar Oct 22 '24 02:10 scollinson

What testing have you done for this? Are you able to show me some code that demonstrates how all of this is used?

philipc avatar Oct 22 '24 23:10 philipc

What testing have you done for this? Are you able to show me some code that demonstrates how all of this is used?

For testing I have been using a very simple script that runs the functions I am after:

use anyhow::{Context, Result};
use memmap2::Mmap;
use object::macho;
use object::read::macho::DyldCache;
use object::Endianness;
use std::fs;
use std::mem::forget;
use std::path::{Path, PathBuf};

fn map(path: PathBuf) -> Result<&'static [u8]> {
    let file = fs::File::open(path).context("failed to open cache path")?;
    let file = unsafe { Mmap::map(&file) }.context("failed to map cache file")?;
    let data = &*file as *const [u8];
    forget(file);
    let data = unsafe { data.as_ref() }.context("cache map is null")?;
    Ok(data)
}

fn main() -> Result<()> {
    let cache_root = Path::new("/System/Volumes/Preboot/Cryptexes/OS/System/Library/dyld");

    let cache = map(cache_root.join("dyld_shared_cache_arm64e"))?;
    let subcaches = &[map(cache_root.join("dyld_shared_cache_arm64e.01"))?];

    let cache = DyldCache::<Endianness>::parse(cache, subcaches)?;

    for reloc in cache.relocations() {
        if let Some(ref auth) = reloc.auth {
            match (auth.key, auth.diversity, auth.addr_div) {
                (macho::PtrauthKey::IA, 0u16, false) => {
                    dbg!(reloc);
                }
                _ => {}
            }
        }
    }

    for mapping in cache.mappings() {
        dbg!(mapping);
    }

    Ok(())
}

I'm not sure how to do any testing with CI as the files are prohibitively large to put in the test binaries repo. I've done a lot of manual comparison with the output of ipsw dyld slide /System/Volumes/Preboot/Cryptexes/OS/System/Library/dyld/dyld_shared_cache_arm64e --json --auth | jq '.[] | select(.pointer.authenticated == true and .pointer.key == "IA" and .pointer.addr_div == false and .pointer.diversity == 0)'.

scollinson avatar Oct 23 '24 02:10 scollinson

I'm having trouble understanding the use of DyldCache::mappings and DyldCache::relocations. Why do you want to add these? For example, I don't see how the relocation returned by this iterator could be used, because its offset is relative to a mapping, but the iterator discards all association with that mapping.

philipc avatar Oct 24 '24 01:10 philipc

Oh yes I can see why that's a little awkward. I should add the address to the DyldRelocation. My goal is to use the mapping (address, protection, data) and relocations (taints, signatures) in a symbolic execution project.

scollinson avatar Oct 24 '24 03:10 scollinson

I've looked a bit more and DyldCache::mappings is probably okay. I was thinking we exposed the subcaches but it seems we don't. That's something we could consider though (add DyldCache::subcaches instead of DyldCache::mappings).

I still don't see the reason for DyldCache::relocations though. Why can't the user call DyldCache::mappings and then call DyldCacheMapping::relocations for each mapping?

philipc avatar Oct 24 '24 04:10 philipc

I still don't see the reason for DyldCache::relocations though. Why can't the user call DyldCache::mappings and then call DyldCacheMapping::relocations for each mapping?

They can do both, and DyldCache::relocations is convenient for when you don't want to know anything about the mappings, though it is a little useless in that case without the address, as you've pointed out. I'll add this now.

scollinson avatar Oct 24 '24 05:10 scollinson

DyldCache::relocations is convenient for when you don't want to know anything about the mappings

Can we instead design it so that it's easy for the user to do something like cache.mappings().flat_map(DyldCacheMapping::relocations)? Turning nested iterators into a boxed chain of iterators is the wrong way to do this.

philipc avatar Oct 24 '24 07:10 philipc

Sorry, rust is quite new for me and I'm not sure what is wrong with that, or what the best approach is. The requirement to box the iterators came because the mappings optionally contain slide information (based on the version of the mapping info, or the mapping itself), not because the iterators are nested?

scollinson avatar Oct 26 '24 05:10 scollinson

Removing the reference from self for DyldCacheMapping::relocations seems to make your example with flatten work. So I presume the idea from there is to remove DyldCache::relocations?

DyldCache::mappings would still return a boxed iterator, is that an issue also?

scollinson avatar Oct 26 '24 05:10 scollinson

It's doing a bunch of memory allocations for something that shouldn't need any memory allocations at all. So while it works, it seems to me that it could be designed better.

So DyldCache::mappings could be changed to this:

    pub fn mappings<'cache>(
        &'cache self,
    ) -> impl Iterator<Item = DyldCacheMapping<'data, E, R>> + 'cache {
        self.mappings.iter().chain(self.subcaches.iter().flat_map(|subcache| subcache.mappings.iter()))
    }

I did try doing the same thing for DyldCache::relocations, but we need to name the 'data lifetime somehow and I'm not sure how. I can look into it more later.

    pub fn relocations<'cache>(&'cache self) -> impl Iterator<Item = DyldRelocation> + 'cache {
        self.mappings().flat_map(|mapping| mapping.relocations())
    }

Or we could leave out DyldCache::relocations, but I would prefer to understand it better first.

I'm sure we could replace the impl Iterator with our own type, but that's a bit more verbose.

philipc avatar Oct 26 '24 06:10 philipc

Have pushed a change so that the following now works:

cache
        .mappings()
        .map(DyldCacheMapping::relocations)
        .flatten()

But I am not sure how to reimplement relocations for the same reason as you said.

scollinson avatar Oct 26 '24 11:10 scollinson

The 'data error is a known limitation of rust that will be fixed in the 2024 edition. Using a workaround, we could do this:

pub trait Captures<'a> { }
impl<'a, T: ?Sized> Captures<'a> for T { }

impl<'data, E, R> DyldCache<'data, E, R>
where
    E: Endian,
    R: ReadRef<'data>,
{
    /// Return all the relocations in this cache.
    pub fn relocations<'cache>(
        &'cache self,
    ) -> impl Iterator<Item = DyldRelocation> + Captures<'cache> + Captures<'data> {
        self.mappings().flat_map(DyldCacheMapping::relocations)
    }
}

Defining our own iterator instead of using flat_map would also workaround it, but we'd have to define an iterator for the mappings too.

I'd prefer to just leave this out instead of doing workarounds. Is that okay? I don't think that writing the flat_map yourself is too hard.

philipc avatar Oct 27 '24 06:10 philipc