rust
rust copied to clipboard
Eliminate 280-byte memset from ReadDir iterator
This guy:
https://github.com/rust-lang/rust/blob/1536ab1b383f21b38f8d49230a2aecc51daffa3d/library/std/src/sys/unix/fs.rs#L589
It turns out libc::dirent64
is quite big—https://docs.rs/libc/0.2.135/libc/struct.dirent64.html. In #103135 this memset accounted for 0.9% of the runtime of iterating a big directory.
Almost none of the big zeroed value is ever used. We memcpy a tiny prefix (19 bytes) into it, and then read just 9 bytes (d_ino
and d_type
) back out. We can read exactly those 9 bytes we need directly from the original entry_ptr instead.
History
This code got added in #93459 and tweaked in #94272 and #94750.
Prior to #93459, there was no memset but a full 280 bytes were being copied from the entry_ptr.
copy 280 bytes |
This was not legal because not all of those bytes might be initialized, or even allocated, depending on the length of the directory entry's name, leading to a segfault. That PR fixed the segfault by creating a new zeroed dirent64 and copying just the guaranteed initialized prefix into it.
memset 280 bytes | copy 19 bytes |
However this was still buggy because it used addr_of!((*entry_ptr).d_name)
, which is considered UB by Miri in the case that the full extent of entry_ptr is not in bounds of the same allocation. (Arguably this shouldn't be a requirement, but here we are.)
The UB got fixed by #94272 by replacing addr_of
with some pointer manipulation based on offset_from
, but still fundamentally the same operation.
memset 280 bytes | copy 19 bytes |
Then #94750 noticed that only 9 of those 19 bytes were even being used, so we could pick out only those 9 to put in the ReadDir value.
memset 280 bytes | copy 19 bytes | copy 9 bytes |
After my PR we just grab the 9 needed bytes directly from entry_ptr.
copy 9 bytes |
The resulting code is more complex but I believe still worthwhile to land for the following reason. This is an extremely straightforward thing to accomplish in C and clearly libc assumes that; literally just entry_ptr->d_name
. The extra work in comparison to accomplish it in Rust is not an example of any actual safety being provided by Rust. I believe it's useful to have uncovered that and think about what could be done in the standard library or language to support this obvious operation better.
References
- https://man7.org/linux/man-pages/man3/readdir.3.html
Hey! It looks like you've submitted a new PR for the library teams!
If this PR contains changes to any rust-lang/rust
public library APIs then please comment with @rustbot label +T-libs-api -T-libs
to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.
Examples of T-libs-api
changes:
- Stabilizing library features
- Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
- Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
- Changing public documentation in ways that create new stability guarantees
- Changing observable runtime behavior of library APIs
r? @Mark-Simulacrum
(rust-highfive has picked a reviewer for you, use r? to override)
@bors r+ rollup=never
:pushpin: Commit 0bb6eb15266b8476138860352d049da786f890f5 has been approved by Mark-Simulacrum
It is now in the queue for this repository.
:hourglass: Testing commit 0bb6eb15266b8476138860352d049da786f890f5 with merge 7fcf850d7942804990a1d2e3fe036622a0fe4c74...
:sunny: Test successful - checks-actions Approved by: Mark-Simulacrum Pushing 7fcf850d7942804990a1d2e3fe036622a0fe4c74 to master...
Finished benchmarking commit (7fcf850d7942804990a1d2e3fe036622a0fe4c74): comparison URL.
Overall result: ❌ regressions - no action needed
@rustbot label: -perf-regression
Instruction count
This is a highly reliable metric that was used to determine the overall result at the top of this comment.
mean[^1] | range | count[^2] | |
---|---|---|---|
Regressions ❌ (primary) |
0.6% | [0.6%, 0.6%] | 1 |
Regressions ❌ (secondary) |
- | - | 0 |
Improvements ✅ (primary) |
- | - | 0 |
Improvements ✅ (secondary) |
- | - | 0 |
All ❌✅ (primary) | 0.6% | [0.6%, 0.6%] | 1 |
Max RSS (memory usage)
This benchmark run did not return any relevant results for this metric.
Cycles
This benchmark run did not return any relevant results for this metric.
[^1]: the arithmetic mean of the percent change [^2]: number of relevant changes