Isn't your Benchmark misleading?
What version of fd are you using?
fd 10.2.0
As I understood correctly, by default fd tries to use all available CPU cores and I think that's most of the benefit that it's getting for better performance.
It would be better to explicitly write it in the Benchmark section and also show the comparison when only 1 thread is used via --threads=1.
The third bullet point in the features list on the README, explicitly states that the speed is due to being parallelized:
Very fast due to parallelized directory traversal.
(emphasis mine)
find is always single-threaded.
Using all of your cores is why fd is faster.
Well, fd -j1 is also faster than find for me, generally. But the parallelism is why it's significantly faster in practice.
I've heard fd is much faster than find, which is something that was of interest to me, since my /home contains over 33 million files inside a half a million of dirs. As you can imagine, file searches are not particularly fast.
I've benchmarked find and fd versions available in Debian 12.
I've tried both hot cache and cold cache regex searches where fd was advertised as being massively faster.
Hot cache:
time find ~/ -xdev -iregex "^config.*\.json$"
real 0m14.673s
user 0m8.657s
sys 0m5.962s
time fd --unrestricted --xdev "^config.*\.json$" ~/
real 0m18.670s
user 1m15.239s
sys 1m48.359s
On a hot cache fd is insignificantly slower, while consuming massive 12.5x CPU time.
echo 3 > /proc/sys/vm/drop_caches
time find ~/ -xdev -iregex "^config.*\.json$"
real 0m25.325s
user 0m8.953s
sys 0m8.514s
time fd --unrestricted --xdev "^config.*\.json$" ~/
real 0m19.849s
user 1m12.760s
sys 1m53.222s
On a cold cache fd is insignificantly faster, maintaining roughly the same one order of magnitude overhead in CPU time.
fdversions available in Debian 12.
What version is this? Looks like 8.6.0? It would be worth trying a newer version, there are a lot of performance improvements introduced in 9.0.0
I've build current master with release target. Here are the results:
Hot cache:
time ~/temp/fd/target/release/fd -HI --unrestricted --color never --xdev "^config.*\.json$" ~/
real 0m3.675s
user 0m21.727s
sys 0m15.676s
fd was 4x faster than find, while consuming 2.5x more CPU time.
Cold cache:
time ~/temp/fd/target/release/fd -HI --unrestricted --color never --xdev "^config.*\.json$" ~/
real 0m5.645s
user 0m19.529s
sys 0m23.103s
fd was 4.5x faster while maintaining the same 2.5x CPU time overhead.
In conclusion, it is true, that current fd master is indeed significantly faster than find.
Yet perhaps not nearly as fast as it is claimed to be.
I do agree with the title of the issue: current benchmarks look misleading to me. Putting a warning into readme about the performance of the old prebuilt versions might be a good idea.
@bedilbek Genuinely trying to understand what is misleading here? Misleading the users to believe what? It is up to the tool to utilize the available resources to perform efficiently. The end-user does not/should not care about the implementation details of the software.
Is it faster than 'find' for listing/searching files? Yes. Then the author has no obligation to explain why or how it is faster.
If not specifying that the tool uses multiple threads underneath, is misleading, then is it misleading if you don't specify that the tool is faster because it uses SIMD too?
If you have a different perspective on this I'm happy to listen. ☮