orgize
orgize copied to clipboard
Orgize validation fails when parsing certain unicode values
In general I expect weird unicode values to get "interesting" results, but I'm going to report this since it results in a panic when debug_assertions are enabled.
Each of these characters, alone, as input, results in a panic in debug builds. I recommend running the example below with --release as otherwise calling parse will panic.
Up to you as to whether it's worth fixing. I saw you had a fuzz test in the source tree so I assume that crashes like this might be of interest, but I can also understand not wanting to go down the unicode rabbithole and it's unclear to me how often these actually come up in real use.
The one or two I tested with org-element work correctly -- a headline containing them in the title is parsed correctly.
fn main() {
let s = "\u{000b}\u{0085}\u{00a0}\u{1680}\u{2000}\u{2001}\u{2002}\u{2003}\u{2004}\u{2005}\u{2006}\u{2007}\u{2008}\u{2009}\u{200a}\u{2028}\u{2029}\u{202f}\u{205f}\u{3000}";
for (i, c) in s.chars().enumerate() {
let org = orgize::Org::parse_string(c.to_string());
println!("Validation ok for {}: {}", i, org.validate().is_empty());
}
}
Thanks for reporting. Orgize will automatically validate the parsed struct and panic if any error occurs. It's disabled in release mode for increasing performance. For fuzz test, I believe it was broken after I upgraded to 2018 edition. But I just keep forgetting to fix it.
Oh, I see. I only check for the ascii whitespaces in some functions. But str::trim
actually remove both ascii whitespaces and unicode whitespaces.
This was fixed by ba9c83c. But I decided to keep this issue opened as a remainder and closed it once we replace every u8::is_ascii_whitespace
with char::is_whitespace
.