coreutils icon indicating copy to clipboard operation
coreutils copied to clipboard

`tail`: fix argument parsing of sleep interval

Open Joining7943 opened this issue 2 years ago • 29 comments

Like discussed, I've extracted this from the refactoring pr of tail, which fixes the parsing of the sleep interval.

The main intend of this pr is to match the behavior of gnu's tail and properly parse a string in f64 format to a Duration.

Short summary:

  • The sleep interval is a f64 instead of a f32
  • Duration::from_secs_f64 panics and Duration::try_from_secs_f64 is not stable, so we need a separate function to accomplish the conversion from f64 to a Duration.

Joining7943 avatar Dec 16 '22 22:12 Joining7943

GNU testsuite comparison:

Congrats! The gnu test tests/rm/rm1 is no longer failing!
Congrats! The gnu test tests/rm/rm2 is no longer failing!

github-actions[bot] avatar Dec 16 '22 23:12 github-actions[bot]

GNU testsuite comparison:

Congrats! The gnu test tests/rm/rm1 is no longer failing!
Congrats! The gnu test tests/rm/rm2 is no longer failing!
GNU test failed: tests/tail-2/inotify-dir-recreate. tests/tail-2/inotify-dir-recreate is passing on 'main'. Maybe you have to rebase?

github-actions[bot] avatar Dec 16 '22 23:12 github-actions[bot]

GNU testsuite comparison:

Congrats! The gnu test tests/misc/timeout is no longer failing!
GNU test failed: tests/misc/tee. tests/misc/tee is passing on 'main'. Maybe you have to rebase?

github-actions[bot] avatar Dec 17 '22 21:12 github-actions[bot]

GNU testsuite comparison:

GNU test failed: tests/tail-2/inotify-dir-recreate. tests/tail-2/inotify-dir-recreate is passing on 'main'. Maybe you have to rebase?

github-actions[bot] avatar Dec 19 '22 21:12 github-actions[bot]

GNU testsuite comparison:

Congrats! The gnu test tests/tail-2/inotify-dir-recreate is no longer failing!

github-actions[bot] avatar Dec 24 '22 01:12 github-actions[bot]

How does this parse_duration() function relate to uucore::parse_time::from_str()? https://github.com/uutils/coreutils/blob/bdd6f56ae4e88c38190c6780ef6b5594996ee03d/src/uucore/src/lib/parser/parse_time.rs#L17-L49

jfinkels avatar Dec 25 '22 17:12 jfinkels

Not sure what you mean, but they don't relate I think. This function tries to provide the same functionality like gnu's tail parsing of the --sleep-interval flag. So, there are for example no units like s, ms etc. allowed and if the number on the command line overflows f64 there's an error. That's the main difference to the uucore::parse_time::from_str function as far as I can see.

Joining7943 avatar Dec 25 '22 23:12 Joining7943

Please do not merge. I'm just trying something, to get a more performant, precise and unlossy version of the parse_duration method.

Joining7943 avatar Dec 27 '22 15:12 Joining7943

@tertsdiepraam I've completely rewritten the parse_duration method and moved it because of its size into the parse module of tail.

This duration parser is almost as fast as Duration::from_secs_f64(input.parse::<f64>().unwrap), has the same grammar like a f64 parser but is not lossy (due to rounding and lack of floating point precision) by storing the exact digit representation of the input string. parse_duration is around 2 - 5 times slower depending on the size of the input string but still operates in the nano seconds domain and takes around 100 - 300 ns for normal input like 1.0e2 on my system. More extreme input like format!("{}.{}e-1022", "1".repeat(2000), "9".repeat(2000)) which can't even parsed without errors by the Duration::from_secs_f64 method still ran in around 5 microseconds on my quadcore system.

If the seconds overflow u64::MAX, the duration is interpreted as Duration::MAX, which is also an improvement over gnu's tail, I think. Imho we don't need to replace the parse_duration method with Duration::try_from_secs_f64 anymore. What do you think?

Joining7943 avatar Dec 31 '22 14:12 Joining7943

I'll have to look closely into the code later, but it sounds great! It does add significant complexity to the code over a single function call, but the advantages you list might be worth it.

Unrelated to the actual change, but only to the tests (which I only quickly skimmed), I'm not sure I like using rstest for creating multiple test cases. Personally, I think a for loop is easier to read and more approachable for new contributors.

tertsdiepraam avatar Dec 31 '22 14:12 tertsdiepraam

Ok, cool. Parameterized tests have the advantage that all test cases can run (and fail). A for loop stops at the first error encountered and it's maybe not clear which case actually failed. That's even worse in the CI, because we don't output the backtrace there. Parameterizing the tests helped me a lot during the writing. I actually don't think this is a huge obstacle for new contributors, since the syntax is quiet clear #[case::some_descriptive_name(...)] (even without knowing anything about rstest) and the output of an errored test case is parse::tests::my_test::case_1_some_descriptive_name pointing to the exact test case which went south by also providing a first hint. rstest is also well documented in my opinion, on one page in the github README and on docs.rs.

Joining7943 avatar Dec 31 '22 15:12 Joining7943

GNU testsuite comparison:

GNU test failed: tests/tail-2/inotify-dir-recreate. tests/tail-2/inotify-dir-recreate is passing on 'main'. Maybe you have to rebase?

github-actions[bot] avatar Jan 01 '23 13:01 github-actions[bot]

I'll have to look closely into the code later,

Have you looked at this yet?

Joining7943 avatar Jan 06 '23 19:01 Joining7943

I had the idea to move this into uucore, if this is too big for tail alone. It's fairly simple to add time unit parsing and make the time units customizable. Each uutil may have its own set of time units it accepts. Maybe we can do this once and for all uutils by also providing a speedy and precise parsing of a Duration. If required, I had some additional ideas to max out the parsing speed, but this would also add additional complexity. What do you think?

Joining7943 avatar Jan 08 '23 19:01 Joining7943

No offense, but I extracted the code I've written into an own crate https://crates.io/crates/fundu. The idea and much of the code is the same. I've taken care, that it's compatible with the requirements of uutils :) I don't intend to change that and I hope you see this as an advantage over the original post. It is also a lot of lines of code less in uutils. Would be happy to hear your opinion and maybe see fundu used in uutils.

Joining7943 avatar Feb 03 '23 20:02 Joining7943

GNU testsuite comparison:

GNU test failed: tests/tail-2/inotify-dir-recreate. tests/tail-2/inotify-dir-recreate is passing on 'main'. Maybe you have to rebase?

github-actions[bot] avatar Feb 06 '23 14:02 github-actions[bot]

GNU testsuite comparison:

GNU test failed: tests/tail-2/inotify-dir-recreate. tests/tail-2/inotify-dir-recreate is passing on 'main'. Maybe you have to rebase?

github-actions[bot] avatar Feb 06 '23 20:02 github-actions[bot]

GNU testsuite comparison:

GNU test failed: tests/tail-2/inotify-dir-recreate. tests/tail-2/inotify-dir-recreate is passing on 'main'. Maybe you have to rebase?

github-actions[bot] avatar Feb 06 '23 21:02 github-actions[bot]

@tertsdiepraam ping?

Joining7943 avatar Feb 07 '23 13:02 Joining7943

I'll take a look now :)

tertsdiepraam avatar Feb 08 '23 19:02 tertsdiepraam

I hope you see this as an advantage over the original post. It is also a lot of lines of code less in uutils.

Agreed, it's also much better documented and tested now, which is great! It's a nice crate for other projects too. Excellent work!

I think it would be interesting to show benchmarks against simple String -> f64 -> Duration to provide some proof that it really has a low (or no) overhead. You could also show some examples where that fails and where fundu succeeds in the README. You could also document a bit more clearly what the default units are. On the code side, the only thing I have to comment, is that the multiplier is a bit confusing because it's sometimes 10^-m and sometimes a normal multiplier. Why not split that into two functions or maybe a function that returns a multiplier and an exponent in a tuple? That might also clean up some branching.

tertsdiepraam avatar Feb 08 '23 19:02 tertsdiepraam

Thanks for your review. It's much appreciated.

I think it would be interesting to show benchmarks against simple String -> f64 -> Duration to provide some proof that it really has a low (or no) overhead

good idea. Currently, the comparison with String -> f64 -> Duration can be seen when running the benches. It's the reference function. fundu will always have some (low) overhead compared to Duration::from_secs_f64 combined with f64::from_str because of its precision and the integration of time unit parsing etc. Parsing some simple input with fundu takes around 50ns and the stdlib methods combined around 26ns on my testing machine. I think, in most use cases this difference won't be noticable. However, I'm working on lowering the difference by some additional nano seconds (maybe 5ns or so) for small input, but especially for large input.

You could also show some examples where that fails and where fundu succeeds in the README. You could also document a bit more clearly what the default units are.

yeah that would be good. It'll be part of 0.3.0. I added a lot of documentation for the public api in general there.

On the code side, the only thing I have to comment, is that the multiplier is a bit confusing because it's sometimes 10^-m and sometimes a normal multiplier. Why not split that into two functions or maybe a function that returns a multiplier and an exponent in a tuple? That might also clean up some branching.

you're right it's confusing. Do you want that method pub? However, I'll try out the tuple.

Joining7943 avatar Feb 08 '23 22:02 Joining7943

Do you want that method pub?

Not really, I was just checking out the code and this stood out :)

tertsdiepraam avatar Feb 08 '23 22:02 tertsdiepraam

ok :) I think this method is not really useful outside of the crate, so I'll remove it from the public api for now.

Joining7943 avatar Feb 08 '23 22:02 Joining7943

GNU testsuite comparison:

GNU test failed: tests/misc/timeout. tests/misc/timeout is passing on 'main'. Maybe you have to rebase?

github-actions[bot] avatar Feb 09 '23 14:02 github-actions[bot]

Are we done here?

Joining7943 avatar Feb 11 '23 16:02 Joining7943

We made a change to how we declare dependencies so the Cargo.toml conflicts, but apart from it's good!

tertsdiepraam avatar Feb 12 '23 15:02 tertsdiepraam

ok great! The errors in the ci aren't related to this pr as far as I can see. However, I'm triggering a rerun, so maybe the deny step recovers.

Joining7943 avatar Feb 12 '23 19:02 Joining7943

ok, looks like the deny step recovered.

Joining7943 avatar Feb 12 '23 20:02 Joining7943

Thanks :)

Joining7943 avatar Feb 16 '23 14:02 Joining7943