goblin icon indicating copy to clipboard operation
goblin copied to clipboard

PE: Add delay import parser

Open kkent030315 opened this issue 6 months ago • 7 comments

Added delay-load import directory parser for PE32/64.

Using Vec as well as existing normal import parser. The design decision behind that Vec is that each import entry requires RVA resolution. I'd say it is not a good design, as both normal and delay import parser runs fully upon goblin::pe::PE::parse that can raise malformed errors -- even if the consumer doesn't need that import info.

I guess almost entire parsing logic can be moved to TryFromCtx if we could build some specific scroll ctx with a hell of lifetimes, so that utils::find_offset became callable inside the TryFromCtx. That's too much hustle but it should theoretically be possible.

/// A binary parsing context for PE parser, including the container size, underlying byte endianness and params for resolving RVAs
#[derive(Debug, Copy, Clone, PartialEq)]
pub struct PeCtx<'a> {
  pub container: Container,
  pub le: scroll::Endian,

  pub sections: &[section_table::SectionTable],
  pub file_alignment: u32,
  pub opts: &options::ParseOptions,
  // full binary view that arbitrary RVA could point to
  pub bytes: &'a [u8],
}
impl<'a> Iterator for DelayImportEntryIterator<'a>
impl<'a> Iterator for DelayImportDescriptorIterator<'a>

Ref:

  • https://www.hexacorn.com/blog/2019/06/03/playing-with-delay-loaded-dlls/

kkent030315 avatar Jun 06 '25 19:06 kkent030315

I made a brief example implementation of my idea to have PeCtx in 2fd0a44. Consider the code an experimental code that aren't ready to ship.

I've added custom Ctx:

/// A binary parsing context for PE parser
#[derive(Debug, Copy, Clone, PartialEq)]
pub struct PeCtx<'a> {
    pub container: Container,
    pub le: scroll::Endian,
    pub sections: &'a [section_table::SectionTable],
    pub file_alignment: u32,
    pub opts: options::ParseOptions,
    pub bytes: &'a [u8], // full binary view
}

The issue is that we cannot acrually borrow the &'a sections in DelayImportData<'a>. We cannot make sections: Vec<section_table::SectionTable> because Ctx must implement Copy that Vec doesn't.

The solution to that problem is to require consumers provide sections when they needed to get delay import info.

impl<'a> DelayImportDll<'a> {
    pub fn functions(
        &self,
        sections: &'a [section_table::SectionTable],
    ) -> DelayImportFunctionIterator<'a> {
        // Replace sections with an actual sections ref
        let pectx = PeCtx::new(
            self.ctx.container,
            self.ctx.le,
            &sections,
            self.ctx.file_alignment,
            self.ctx.opts,
            self.ctx.bytes,
        );
        DelayImportFunctionIterator {
            ctx: pectx,
            bytes: self.bytes,
            offset: 0,
            descriptor: self.descriptor,
        }
    }
}

impl<'a> DelayImportData<'a> {
    pub fn dlls(&self, sections: &'a [section_table::SectionTable]) -> DelayImportDllIterator<'a> {
        // Replace sections with an actual sections ref
        let pectx = PeCtx::new(
            self.ctx.container,
            self.ctx.le,
            &sections,
            self.ctx.file_alignment,
            self.ctx.opts,
            self.ctx.bytes,
        );
        DelayImportDllIterator {
            ctx: pectx,
            bytes: self.bytes,
            offset: 0,
        }
    }
}

I think, the best is to have an offset of section table and parsing it when needed. So that we eliminate the odd sections parameter in public APIs.

/// A binary parsing context for PE parser
#[derive(Debug, Copy, Clone, PartialEq)]
pub struct PeCtx<'a> {
    pub container: Container,
    pub le: scroll::Endian,
-   pub sections: &'a [section_table::SectionTable],
+   // let sections = ctx.bytes.pread_with(ctx.sections_offset, scroll::LE)?;
+   pub sections_offset: usize, 
    pub file_alignment: u32,
    pub opts: options::ParseOptions,
    pub bytes: &'a [u8], // full binary view
}

Otherwise it looks very clean design for me. It makes parsing of delay imports deferred to when it's needed.

let res = goblin::pe::PE::parse(&buffer)?;
let di = res.delay_import_data.unwrap();
let sections = &res.sections;
let delayimports = di
  .dlls(&sections)
  .filter_map(Result::ok)
  .flat_map(|x| x.functions(&sections))
  .filter_map(Result::ok)
  .collect::<Vec<_>>();

kkent030315 avatar Jun 09 '25 15:06 kkent030315

can you rebase, CI had an issue on nightly

m4b avatar Jun 16 '25 04:06 m4b

Have not fully read over the PeCtx idea, looks interesting though; few initial notes:

  1. why is

The issue is that we cannot acrually borrow the &'a sections in DelayImportData<'a>

the case? is it some kind of cycle issue or? 2. for

        let pectx = PeCtx::new(
            self.ctx.container,
            self.ctx.le,
            &sections,
            self.ctx.file_alignment,
            self.ctx.opts,
            self.ctx.bytes,
        );

you can rewrite this as:

let pectx = PeCtx { sections: &sections, ..self.ctx) }

as long as all fields of PeCtx are accessible in the use context (e.g., public, or within the file if private, module if pub(crate)). In other languages this is called a "functional record update" or something to that effect, ocaml has a similar syntactic mechanism, for example.

m4b avatar Jun 16 '25 05:06 m4b

the case? is it some kind of cycle issue or?

Nah it's the issue where sections is Vec in PE::parse that cannot outlives. &'a bytes has longer lifetime because it is being passed from consumer.

https://github.com/m4b/goblin/blob/f1b6cae5d5f474ecefc60298e15b90f82b979e39/src/pe/mod.rs#L51-L52 https://github.com/m4b/goblin/blob/f1b6cae5d5f474ecefc60298e15b90f82b979e39/src/pe/mod.rs#L107

kkent030315 avatar Jun 16 '25 09:06 kkent030315

sorry this didn't make it in for 0.10.1, but it does need a rebase; i did have some concerns about this, as it looked a little complicated, but i'll just have to re-review again when it's ready. Also I can't remember if it had breaking changes, so if it did it wouldn't have made it into 0.10.1; with this, I think i've caught up to your absolutely massive amount of PRs :D

m4b avatar Aug 15 '25 03:08 m4b

@m4b This PR needs a direction on which way this PR should go with: A) traditional Vec or B) new PeCtx<'a> and "deferred" iterators. I personally prefer challenging on PeCtx<'a> that might also be applied to import/export parser rework in near future. WDYT?

edit: current patch includes both case.

kkent030315 avatar Aug 15 '25 10:08 kkent030315

Once #239 gets merged we could add section_table: &'a [u8] in PeCtx that will get parsed inside FromCtx and such.

/// A binary parsing context for PE parser, including the container size, underlying byte endianness and params for resolving RVAs
#[derive(Debug, Copy, Clone, PartialEq)]
pub struct PeCtx<'a> {
  pub container: Container,
  pub le: scroll::Endian,
  pub section_table: &'a [u8],
  pub file_alignment: u32,
  pub opts: &options::ParseOptions,
  pub bytes: &'a [u8], // full binary view
}
impl<'a> ctx::TryFromCtx<'a, (DelayImportDescriptor, PeCtx<'a>)> for DelayImportFunction<'a> {
    type Error = crate::error::Error;
    fn try_from_ctx(
        bytes: &'a [u8],
        ctx: (DelayImportDescriptor, PeCtx<'a>),
    ) -> error::Result<(Self, usize)> {
        let sections_it = SectionTableIterator { // Fused, ExactSize
            data: ctx.section_table, // &'a [u8]
        };
        // Actually doesnt need to alloc here, passing iterator to utils::find_offset_ex directly or smth
        let sections: Vec<SectionHeader> = sections_it.collect<Result<Vec<_>>>()?;
        // Or even `ctx.section_table.pread::<&[SectionHeader]>(0)` maybe?
        let name = if is_ordinal {
            None
        } else {
            let dll_name_rva = descriptor.name_rva;
            let dll_name_offset = utils::find_offset(
                dll_name_rva as usize,
                sections,
                ctx.file_alignment,
                &ctx.opts,
            )
            .ok_or_else(|| {
                error::Error::Malformed(format!(
                    "cannot map delay import dll name rva {dll_name_rva:#x}"
                ))
            })?;
            let dll_name = ctx.bytes.pread::<&'a str>(dll_name_offset)?;
            Some(dll_name)
        };
		// ...
    }
}

kkent030315 avatar Aug 15 '25 14:08 kkent030315