lua-patterns icon indicating copy to clipboard operation
lua-patterns copied to clipboard

Possible to cause a panic with gsub

Open ghost opened this issue 4 years ago • 6 comments

use lua_patterns::LuaPattern;

fn main() {
    let mut m = LuaPattern::from_bytes(&[160]);
    let s = "⚠";
    s.bytes().for_each(|b| println!("{}" , b));

    m.gsub(s , "");
}

The ⚠ is made up of the bytes [226 , 154 , 160] removing the 160 causes the string to become invalid unicode

ghost avatar Mar 29 '21 17:03 ghost

Thanks for the report - yes, this kind of thing is always possible with arbitrary byte patterns. Not sure what the policy here should be.

I think we need gsub_bytes - there's already gsub_bytes_with.

stevedonovan avatar Apr 05 '21 14:04 stevedonovan

Although what the format of the second replacement argument is - not yet sure!

stevedonovan avatar Apr 05 '21 15:04 stevedonovan

I think that any function that handles strings should just be a wrapper around the one which handles bytes but then calls vec.into(). such as

gsub(&mut self , text: &str , repl: &str) -> String{
    self.gsub_bytes(text.as_bytes() , repl.as_bytes).into()
}

this is a zero copy operation and only needs to scan for invalid bytes

ghost avatar Apr 08 '21 10:04 ghost

also does the functions passed to gsub_*_with really need to return an owned type could they not return a &[u8] or &str

ghost avatar Apr 08 '21 10:04 ghost

also does the functions passed to gsub_*_with really need to return an owned type could they not return a &[u8] or &str

We can do what the regex crate does, and return an appropriate Cow to optimize the case where no substitution took place.

stevedonovan avatar Apr 10 '21 08:04 stevedonovan

I think that any function that handles strings should just be a wrapper around the one which handles bytes but then calls vec.into(). such as

This is pretty much how gsub is implemented now - only really have to do the bytes case!

There is also gsub_checked which returns a Result, to avoid the panicking problem.

stevedonovan avatar Apr 10 '21 09:04 stevedonovan