coreutils
coreutils copied to clipboard
`tail`: fix argument parsing of sleep interval
Like discussed, I've extracted this from the refactoring pr of tail, which fixes the parsing of the sleep interval.
The main intend of this pr is to match the behavior of gnu's tail and properly parse a string in f64 format to a Duration.
Short summary:
- The sleep interval is a
f64instead of af32 Duration::from_secs_f64panics andDuration::try_from_secs_f64is not stable, so we need a separate function to accomplish the conversion fromf64to aDuration.
GNU testsuite comparison:
Congrats! The gnu test tests/rm/rm1 is no longer failing!
Congrats! The gnu test tests/rm/rm2 is no longer failing!
GNU testsuite comparison:
Congrats! The gnu test tests/rm/rm1 is no longer failing!
Congrats! The gnu test tests/rm/rm2 is no longer failing!
GNU test failed: tests/tail-2/inotify-dir-recreate. tests/tail-2/inotify-dir-recreate is passing on 'main'. Maybe you have to rebase?
GNU testsuite comparison:
Congrats! The gnu test tests/misc/timeout is no longer failing!
GNU test failed: tests/misc/tee. tests/misc/tee is passing on 'main'. Maybe you have to rebase?
GNU testsuite comparison:
GNU test failed: tests/tail-2/inotify-dir-recreate. tests/tail-2/inotify-dir-recreate is passing on 'main'. Maybe you have to rebase?
GNU testsuite comparison:
Congrats! The gnu test tests/tail-2/inotify-dir-recreate is no longer failing!
How does this parse_duration() function relate to uucore::parse_time::from_str()? https://github.com/uutils/coreutils/blob/bdd6f56ae4e88c38190c6780ef6b5594996ee03d/src/uucore/src/lib/parser/parse_time.rs#L17-L49
Not sure what you mean, but they don't relate I think. This function tries to provide the same functionality like gnu's tail parsing of the --sleep-interval flag. So, there are for example no units like s, ms etc. allowed and if the number on the command line overflows f64 there's an error. That's the main difference to the uucore::parse_time::from_str function as far as I can see.
Please do not merge. I'm just trying something, to get a more performant, precise and unlossy version of the parse_duration method.
@tertsdiepraam I've completely rewritten the parse_duration method and moved it because of its size into the parse module of tail.
This duration parser is almost as fast as Duration::from_secs_f64(input.parse::<f64>().unwrap), has the same grammar like a f64 parser but is not lossy (due to rounding and lack of floating point precision) by storing the exact digit representation of the input string. parse_duration is around 2 - 5 times slower depending on the size of the input string but still operates in the nano seconds domain and takes around 100 - 300 ns for normal input like 1.0e2 on my system. More extreme input like format!("{}.{}e-1022", "1".repeat(2000), "9".repeat(2000)) which can't even parsed without errors by the Duration::from_secs_f64 method still ran in around 5 microseconds on my quadcore system.
If the seconds overflow u64::MAX, the duration is interpreted as Duration::MAX, which is also an improvement over gnu's tail, I think. Imho we don't need to replace the parse_duration method with Duration::try_from_secs_f64 anymore. What do you think?
I'll have to look closely into the code later, but it sounds great! It does add significant complexity to the code over a single function call, but the advantages you list might be worth it.
Unrelated to the actual change, but only to the tests (which I only quickly skimmed), I'm not sure I like using rstest for creating multiple test cases. Personally, I think a for loop is easier to read and more approachable for new contributors.
Ok, cool. Parameterized tests have the advantage that all test cases can run (and fail). A for loop stops at the first error encountered and it's maybe not clear which case actually failed. That's even worse in the CI, because we don't output the backtrace there. Parameterizing the tests helped me a lot during the writing. I actually don't think this is a huge obstacle for new contributors, since the syntax is quiet clear #[case::some_descriptive_name(...)] (even without knowing anything about rstest) and the output of an errored test case is parse::tests::my_test::case_1_some_descriptive_name pointing to the exact test case which went south by also providing a first hint. rstest is also well documented in my opinion, on one page in the github README and on docs.rs.
GNU testsuite comparison:
GNU test failed: tests/tail-2/inotify-dir-recreate. tests/tail-2/inotify-dir-recreate is passing on 'main'. Maybe you have to rebase?
I'll have to look closely into the code later,
Have you looked at this yet?
I had the idea to move this into uucore, if this is too big for tail alone. It's fairly simple to add time unit parsing and make the time units customizable. Each uutil may have its own set of time units it accepts. Maybe we can do this once and for all uutils by also providing a speedy and precise parsing of a Duration. If required, I had some additional ideas to max out the parsing speed, but this would also add additional complexity. What do you think?
No offense, but I extracted the code I've written into an own crate https://crates.io/crates/fundu. The idea and much of the code is the same. I've taken care, that it's compatible with the requirements of uutils :) I don't intend to change that and I hope you see this as an advantage over the original post. It is also a lot of lines of code less in uutils. Would be happy to hear your opinion and maybe see fundu used in uutils.
GNU testsuite comparison:
GNU test failed: tests/tail-2/inotify-dir-recreate. tests/tail-2/inotify-dir-recreate is passing on 'main'. Maybe you have to rebase?
GNU testsuite comparison:
GNU test failed: tests/tail-2/inotify-dir-recreate. tests/tail-2/inotify-dir-recreate is passing on 'main'. Maybe you have to rebase?
GNU testsuite comparison:
GNU test failed: tests/tail-2/inotify-dir-recreate. tests/tail-2/inotify-dir-recreate is passing on 'main'. Maybe you have to rebase?
@tertsdiepraam ping?
I'll take a look now :)
I hope you see this as an advantage over the original post. It is also a lot of lines of code less in uutils.
Agreed, it's also much better documented and tested now, which is great! It's a nice crate for other projects too. Excellent work!
I think it would be interesting to show benchmarks against simple String -> f64 -> Duration to provide some proof that it really has a low (or no) overhead. You could also show some examples where that fails and where fundu succeeds in the README. You could also document a bit more clearly what the default units are. On the code side, the only thing I have to comment, is that the multiplier is a bit confusing because it's sometimes 10^-m and sometimes a normal multiplier. Why not split that into two functions or maybe a function that returns a multiplier and an exponent in a tuple? That might also clean up some branching.
Thanks for your review. It's much appreciated.
I think it would be interesting to show benchmarks against simple String -> f64 -> Duration to provide some proof that it really has a low (or no) overhead
good idea. Currently, the comparison with String -> f64 -> Duration can be seen when running the benches. It's the reference function. fundu will always have some (low) overhead compared to Duration::from_secs_f64 combined with f64::from_str because of its precision and the integration of time unit parsing etc. Parsing some simple input with fundu takes around 50ns and the stdlib methods combined around 26ns on my testing machine. I think, in most use cases this difference won't be noticable. However, I'm working on lowering the difference by some additional nano seconds (maybe 5ns or so) for small input, but especially for large input.
You could also show some examples where that fails and where fundu succeeds in the README. You could also document a bit more clearly what the default units are.
yeah that would be good. It'll be part of 0.3.0. I added a lot of documentation for the public api in general there.
On the code side, the only thing I have to comment, is that the multiplier is a bit confusing because it's sometimes 10^-m and sometimes a normal multiplier. Why not split that into two functions or maybe a function that returns a multiplier and an exponent in a tuple? That might also clean up some branching.
you're right it's confusing. Do you want that method pub? However, I'll try out the tuple.
Do you want that method pub?
Not really, I was just checking out the code and this stood out :)
ok :) I think this method is not really useful outside of the crate, so I'll remove it from the public api for now.
GNU testsuite comparison:
GNU test failed: tests/misc/timeout. tests/misc/timeout is passing on 'main'. Maybe you have to rebase?
Are we done here?
We made a change to how we declare dependencies so the Cargo.toml conflicts, but apart from it's good!
ok great! The errors in the ci aren't related to this pr as far as I can see. However, I'm triggering a rerun, so maybe the deny step recovers.
ok, looks like the deny step recovered.
Thanks :)