yara-x icon indicating copy to clipboard operation
yara-x copied to clipboard

Using yara-x in a Rust library and handling lifetime specifiers of Scanner

Open xrl1 opened this issue 1 year ago • 6 comments

Hello, Related to #139 , but maybe not the same use case: I'm trying to create a library that uses the yara-x crate. The library should initialize the rules internally, and create a struct that holds an instance of the yara-x Scanner.

Reducted code in scanner.rs:

use yara_x::{Rules, Scanner as YaraScanner};
use anyhow::Result;

struct MyScanner<'a> {
    scanner: YaraScanner<'a>,
}

impl<'a> MyScanner<'a> {
    pub fn new(rules: &'a Rules) -> Self {
        let scanner = YaraScanner::new(&rules);
        MyScanner { scanner }
    }

    pub fn scan(data: String) -> Result<String> {
        // Some implementation
    }
}}

Reducted code of lib.rs:

pub struct MyLib<'a> {
    scanner: MyScanner<'a>,
}

impl<'a> MyLib<'a> {
    pub fn new() -> Result<Self> {
        let rules: Rules = load_rules()?;
        let scanner = MyScanner::new(&rules);
        Ok(MyLib { scanner })
    }

    pub fn scan(&self, data: String) -> Result<String> {
        self.scanner.scan(data)
    }
}

load_rules is compiling and loading the rules from a resource file.

I tried countless variations of this code, but I always reach the obstacle of the lifetime specifier on Scanner and get an error of "rules does not live long enough".

I couldn't find a way to wrap YaraScanner in an object that outlives it and holds a Rules object safely.

I cannot create the rules in main.rs because I intend to export this as a library, and I don't want the user to load the Yara rules herself.

The only solution Claude Sonnet and I found where to Box::leak this memory or statically load it, so it will live until the program exits. I want to avoid it to support in the future getting string rules as arguments to MyLib::new, so I'm confined to the lifetime of a MyLib instance.

Please let me know how you think it can be solved, because currently, I think only changing Scanner to take ownership of the rules can solve this.

xrl1 avatar Sep 01 '24 10:09 xrl1

I believe you can achieve what you want with a bit of unsafe code:

/// Wraps a yara_x::Rules, but preventing it from moving around in memory.
struct PinnedRules{
    rules: yara_x::Rules,
    _pin: PhantomPinned,
}

struct MyScanner<'a> {
    scanner: yara_x::Scanner<'a>,
    // This allows MyScanner to own the yara_x::Rules and pass a reference to the
    // scanner. The use of `Pin` guarantees that the rules won't be moved.
    _rules: Pin<Box<PinnedRules>>,
}

impl<'a> MyScanner<'a> {
    pub fn new(rules: yara_x::Rules) -> Self {
        let pinned_rules = Box::pin(PinnedRules{rules, _pin: PhantomPinned});
        let rules_ptr = std::ptr::from_ref(&pinned_rules.rules);
        let rules_ref = unsafe { rules_ptr.as_ref().unwrap() };
        let scanner = yara_x::Scanner::new(rules_ref);

        Self { scanner, _rules: pinned_rules }
    }

    pub fn scan(&mut self, data: String) -> Result<String> {
        todo!()
    }
}

I haven't tested it thoroughly, so it may contain bugs.

plusvic avatar Sep 01 '24 15:09 plusvic

Thank you, I tested this change in my code, all the tests passed, and nothing panics.

Even though it works, I think this solution is suboptimal - I need to test it more thoroughly, and I'll deep-dive into std::pin docs to make sure this unsafe code won't crash in the future, won't memory-leak, and there isn't any race in the destructor of MyScanner that may cause invalid memory access.

May I still suggest handling this issue in the yara-x library sometime in the future - to avoid forcing the library user to write unsafe code, or to introduce advanced Rust concepts.

xrl1 avatar Sep 01 '24 20:09 xrl1

@xrl1 then only thing you have to do is that you need to put Rules within your scanner struct so that the Rust compiler knows its lifetime doesn't expire before the struct is dropped. It means your MyScanner needs to own your Rules.

pub struct MyScanner<'s> {
    rules: yara_x::Rules,
    scanner: Option<yara_x::Scanner<'s>>,
}

qjerome avatar Sep 02 '24 07:09 qjerome

@qjerome that doesn't work because yara_x::Scanner needs a reference to the rules in MyScanner.rules, you can create a scanner that receives that reference, but you can't move it into MyScanner.scanner.

plusvic avatar Sep 02 '24 08:09 plusvic

My bad, I thought it would work ! That's what you get when you write code without testing it ...

qjerome avatar Sep 02 '24 08:09 qjerome

Let me add a non null contribution this time:

impl<'s> Deref for MyScanner<'s> {
    type Target = yara_x::Scanner<'s>;
    fn deref(&self) -> &Self::Target {
        &self.scanner
    }
}

impl<'s> DerefMut for MyScanner<'s> {
    fn deref_mut(&mut self) -> &mut Self::Target {
        &mut self.scanner
    }
}

Should allow you to use your MyScanner as a yara_x::Scanner

qjerome avatar Sep 02 '24 09:09 qjerome

I was also doing this and ran into the exact same issue... the lifetimes of yara-x in terms of using it as a Rust crate is kind of complex. I've spent a few hours trying to appease all of the yara-x lifetimes implementing a similar design.

Here's the ideal design for a scanner impl in my brain:

  • new -> ScanEngine. Compiler::new().
  • ScanEngine.load_definitions(path) to add definitions
  • ScanEngine.start() rules = compiler.build(), scanner = Scanner::new(rules)
  • ScanEngine.scan_file(path: PathBuf), etc

A lot of people are going to gravitate to this usage of yara-x since it makes the most logical sense. However, it seems like anyone who tries to pack Compiler / Rules / Scanner into a struct is going to run into some serious trouble falling back to unsafe pins.

kallisti5 avatar Mar 11 '25 15:03 kallisti5