fd icon indicating copy to clipboard operation
fd copied to clipboard

Linux specific improvements (BIG potential speedup)

Open alexcu2718 opened this issue 9 months ago • 3 comments

Hi All,

Links to be found here

https://crates.io/crates/fdf

https://github.com/alexcu2718/fdf

https://github.com/alexcu2718/fdf/tree/main/fd_benchmarks

I've made a rough skeleton copy of fd.

The reason I've done this was learning rust and C and I am terribly disorganised and want to combine my efforts into something where I utilise both.

So I do the natural thing and...make an overly complicated tool to fight it. I'm a genius, you don't need to tell me.

I have replicated about 30-40% of the features, note that I can't be too bothered to recreate the rest.

I'm posting this here as a question to see if the maintainers would want me to further develop my idea---commit to this project., or take it into my own project.

Some small issues I haven't bothered to touch yet because still VERY work-in-progress.

  1. I have not implemented custom errors, it's pretty Box dyn error style... (Handle errors at the wrapup...not early on)

  2. I believe my parallelism attempts are far from ideal, I think I can develop my traversal strategy to be much more refined.

  3. I do not know how reliable my methodology would be on eg: btrfs or ext2, due to using basic cheap syscalls to do so.

Quick rundown of methodology:

I basically remade a read_dir that uses inputs and outputs raw bytes, this is handy because I can pass it to regex without any cost ( and also recurse without any overhead!)

I've minimised heap allocations, not enough I believe, I'm still very new to C-RUST.

By using cheaper syscalls than eg fstat, I manage to keep the speed pretty damn good. I do get a lot of metadata for free. Notable exceptions are symlinks/executables, the speed for filtering these is still faster than fd.

There's a lot of unsafe code in here, mostly raw pointer casts, I've tested it on a recent Arch+Debian install and it works out a lot quicker/no issues of UB.

NOTE:

I HAVE NOT DONE THE 'NO PATTERN' as there's some weird bugs for them not aligning. (There's weird issues with either truncation or an extra slash being added? Not sure, given the fact the rest of the benchmarks are spot on, I'm wondering if it's temporary files or whatever)

the benchmarks seen here are 100% matching(IT IS MUCH FASTER though)

The following benchmarks (works on my machine TM)

Command Mean [ms] Min [ms] Max [ms] Relative fdf -HI '.[0-9].jpg$' '/home/alexc' 354.1 ± 1.3 352.6 356.6 5.88 ± 0.08 fdf '.[0-9].jpg$' '/home/alexc' 60.2 ± 0.8 59.1 63.8 1.00 fd -HI '.[0-9].jpg$' '/home/alexc' 460.0 ± 13.8 446.8 490.4 7.64 ± 0.25 fd '.[0-9].jpg$' '/home/alexc' 152.2 ± 1.1 150.4 154.8 2.53 ± 0.04

Command Mean [ms] Min [ms] Max [ms] Relative fdf -HI --extension 'jpg' '' '/home/alexc' 451.2 ± 2.7 447.8 456.0 1.00 fd -HI --extension 'jpg' '' '/home/alexc' 669.9 ± 13.0 659.1 703.1 1.48 ± 0.03

Command Mean [ms] Min [ms] Max [ms] Relative fdf . '/home/alexc' -HI --type l 489.0 ± 2.2 484.6 491.7 1.00 fd -HI '' '/home/alexc' --type l 622.2 ± 3.2 616.3 625.9 1.27 ± 0.01

I will say that developing this has some pretty IFFY* choices performance wise in some regards, mostly I wanted to get the main skeleton working. I'm also aware I might need to totally redesign some aspects, what do you expect from a guy who's been learning for only 4 months when he's sick of his shitty python/bash/C# job.

(*though I think my DirEntry is pretty damn good efficiency wise!)

So,

Please let me know your thoughts. If you'd like me to do a proper rewrite and you'd accept the code(if it looked good), I'd be happy to do so.

Thanks,

Alex

alexcu2718 avatar Mar 05 '25 22:03 alexcu2718

Added the fixes for no pattern to my repo.

There's some weird bits with excluding the start dir, also I have no idea why xonshrc/.bash_logout get excluded.

copypasting results below:

Summary fdf '' '/home/alexc' -HI ran 1.54 ± 0.04 times faster than fd '' '/home/alexc' -HI WARNING: There were differences between the search results of fd and find! Run 'diff /tmp/results.fd /tmp/results.find'. the count of files in the results.fd are 2426601 the count of files in the results.find are 2426600 the total difference are 6 ❯ diff /tmp/results.fd /tmp/results.find 0a1

/home/alexc 8d8 < /home/alexc/.bash_logout 2425419d2425418 < /home/alexc/.xonshrc

alexcu2718 avatar Mar 05 '25 23:03 alexcu2718

I'm not really sure what the purpose of this issue is. Do you think there is something from your cose that could be applied to fd to make it faster?

tmccombs avatar Mar 11 '25 04:03 tmccombs

I'm not really sure what the purpose of this issue is. Do you think there is something from your cose that could be applied to fd to make it faster?

I talked with sharkdp via email. Basically he told me to put it as an issue.

Essentially why I've put this here is a proof of concept of how to increase the speed. Given the fact it's a total rewrite for only Linux(maybe bsd, no idea about Macos), it's probably more effort than it's worth. So I was debating whether the potential speed increase was worth a total rewrite. In my benchmarks it's at least 1.4x speedup, sometimes up to 3x, so wondered if that was of interest.

Otherwise I'll probably just make my own utility.

alexcu2718 avatar Mar 11 '25 11:03 alexcu2718