rust icon indicating copy to clipboard operation
rust copied to clipboard

Eliminate 280-byte memset from ReadDir iterator

Open dtolnay opened this issue 1 year ago • 2 comments

This guy:

https://github.com/rust-lang/rust/blob/1536ab1b383f21b38f8d49230a2aecc51daffa3d/library/std/src/sys/unix/fs.rs#L589

It turns out libc::dirent64 is quite big—https://docs.rs/libc/0.2.135/libc/struct.dirent64.html. In #103135 this memset accounted for 0.9% of the runtime of iterating a big directory.

Almost none of the big zeroed value is ever used. We memcpy a tiny prefix (19 bytes) into it, and then read just 9 bytes (d_ino and d_type) back out. We can read exactly those 9 bytes we need directly from the original entry_ptr instead.

History

This code got added in #93459 and tweaked in #94272 and #94750.

Prior to #93459, there was no memset but a full 280 bytes were being copied from the entry_ptr.

copy 280 bytes

This was not legal because not all of those bytes might be initialized, or even allocated, depending on the length of the directory entry's name, leading to a segfault. That PR fixed the segfault by creating a new zeroed dirent64 and copying just the guaranteed initialized prefix into it.

memset 280 bytescopy 19 bytes

However this was still buggy because it used addr_of!((*entry_ptr).d_name), which is considered UB by Miri in the case that the full extent of entry_ptr is not in bounds of the same allocation. (Arguably this shouldn't be a requirement, but here we are.)

The UB got fixed by #94272 by replacing addr_of with some pointer manipulation based on offset_from, but still fundamentally the same operation.

memset 280 bytescopy 19 bytes

Then #94750 noticed that only 9 of those 19 bytes were even being used, so we could pick out only those 9 to put in the ReadDir value.

memset 280 bytescopy 19 bytescopy 9 bytes

After my PR we just grab the 9 needed bytes directly from entry_ptr.

copy 9 bytes

The resulting code is more complex but I believe still worthwhile to land for the following reason. This is an extremely straightforward thing to accomplish in C and clearly libc assumes that; literally just entry_ptr->d_name. The extra work in comparison to accomplish it in Rust is not an example of any actual safety being provided by Rust. I believe it's useful to have uncovered that and think about what could be done in the standard library or language to support this obvious operation better.

References

  • https://man7.org/linux/man-pages/man3/readdir.3.html

dtolnay avatar Oct 17 '22 07:10 dtolnay

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

rustbot avatar Oct 17 '22 07:10 rustbot

r? @Mark-Simulacrum

(rust-highfive has picked a reviewer for you, use r? to override)

rust-highfive avatar Oct 17 '22 07:10 rust-highfive

@bors r+ rollup=never

Mark-Simulacrum avatar Oct 23 '22 02:10 Mark-Simulacrum

:pushpin: Commit 0bb6eb15266b8476138860352d049da786f890f5 has been approved by Mark-Simulacrum

It is now in the queue for this repository.

bors avatar Oct 23 '22 02:10 bors

:hourglass: Testing commit 0bb6eb15266b8476138860352d049da786f890f5 with merge 7fcf850d7942804990a1d2e3fe036622a0fe4c74...

bors avatar Oct 23 '22 18:10 bors

:sunny: Test successful - checks-actions Approved by: Mark-Simulacrum Pushing 7fcf850d7942804990a1d2e3fe036622a0fe4c74 to master...

bors avatar Oct 23 '22 21:10 bors

Finished benchmarking commit (7fcf850d7942804990a1d2e3fe036622a0fe4c74): comparison URL.

Overall result: ❌ regressions - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean[^1] range count[^2]
Regressions ❌
(primary)
0.6% [0.6%, 0.6%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.6% [0.6%, 0.6%] 1

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

This benchmark run did not return any relevant results for this metric.

[^1]: the arithmetic mean of the percent change [^2]: number of relevant changes

rust-timer avatar Oct 23 '22 22:10 rust-timer