pip icon indicating copy to clipboard operation
pip copied to clipboard

PERF: 7% faster pip install, remove fsync call in temporary file, not necessary

Open morotti opened this issue 1 year ago • 3 comments

Hello,

could I have a review on this code?

profiling on a run of pip install jupyter

_install_wheel() function is 3670 ms. that's the installation phase. fsync is 594 ms out of that, about 15% of the install phase (or 7% of the whole pip run).

It is called here 3 times https://github.com/pypa/pip/blob/8eadcab329a765571c849efe370afd5d5bba425c/src/pip/_internal/operations/install/wheel.py#L672

that's the end of the wheel installation, in _generate_file() decorator:

  • to create the dist-info/INSTALLER file that always contains "pip" string
  • to create the dist-info/direct-url-something.json that contains a one line json file with the path/url.
  • to create the dist-info/RECORD file that contains a list of all installed files (allegedly the most important file)

the way it works: it creates a temporary file in the same directory as the final path + flush() + fsync() + close() + atomic rename() the file to the final path. I think the fsync call is not necessary.

fsync calls are surprisingly expensive, I am always seeing multiple milliseconds on various types of disks on my machine and even on /tmp. there are only 3 files fsync'ed per extracted wheel, yet they take almost as long as everything else. I think the run has 5564 files calls to save() to extract files + 116 calls to fsync() for metadata files + for a total of 58 wheels to install.

image image

morotti avatar Jul 19 '24 12:07 morotti

This looks reasonable, but I don't know enough about filesystem management to be sure that the fsync call is unnecessary. The history of the code doesn't give any indication.

This was added in 2019 with a comment "The file is created securely and is ensured to be written to disk after the context reaches its end."

My best guess, the guy who wrote this code went over-the-top to both flush() and fsync() to ensure the content is written. He probably through the record file is very important and went all in! :D (The initial commit might have been bugged because it flushed the file result.file.flush() instead of the buffer result.flush())

As far as I understand, the fsync() is unnecessary because the flush() will flush AND the end of the with block will autoclose and flush too (closing is flushing). If something crashes like a power loss before the write is complete and the rename is done, it's okay because the file is still a temporary filename. It's the atomic rename() after write that ensures consistency. (By the way, I don't think that fsync() provides any guarantee that the data will be written to the media on modern hardware)

The code itself was probably a copy/pasted snippet from somewhere else (that didn't do atomic rename) or from previous code in pip (the commit message implies pip was writing in place before).

morotti avatar Jul 19 '24 15:07 morotti

I think this patch is ready to merge.

@ichard26 per the link you gave, ext4 filesystem is ensuring that a rename can only be written after pending writes to the file have completed. the fsync is not necessary. from the messages in the mailing list, they fix the filesystem because yum, the package manager on centos/rhel, was also doing rename after writes to ensure atomic changes and the filesystem had to handle it.

morotti avatar Jul 30 '24 15:07 morotti

Isn't that a kernel/filesystem level hack? You know that we have other operation systems and filesystems to support. I'm not saying that we need to support durable installations, but if we want to, then relying on an ext4 feature is not the way to go.

ichard26 avatar Jul 30 '24 15:07 ichard26

reping. forgot to follow up.

would be nice to merge the patch to get the performance improvement! :D

morotti avatar Dec 11 '24 15:12 morotti

@morotti looks like you forgot to respond to Richard's concern? It sounds like said guarantee is file system-specific and other file systems might behave differently. This might be important in the light of pip's cross-platform nature. Even under GNU/Linux distros, people tend to use different file systems, but beyond that, it's rarely possible to use the same file systems under different operating systems. Moreover, on GNU/Linux it's quite common for block devices to be layered on top of others (LUKS/LVM/btrfs). Do you know of implications this would have on them and file systems like NTFS/HFS/exfat etc.? What are the implications of sync on rename for those? Note that a rename across file system boundaries is typically implemented as copy+remove. It sounds safer to do that fsync syscall than not.

webknjaz avatar Dec 11 '24 18:12 webknjaz

sorry for delay, there is no issue as far as I am aware.

the long discussion above is misleading, it was mainly around a bug ticket on RHEL/CentOS/ext4. early versions of ext4 started reordering filesystem operations in unsafe ways and data caused corruption. this was noticed with yum install leaving empty files and that was quite a problem. it was fixed forever ago after the issue was understood. I'm not aware of any filesystem being affected by a similar issue.

by the way, the reason this code does a fsync() in the first place is because it is decades old code that used to write files in place which was unsafe (irrelevant of calling fsync or not). the code was fixed years ago to write to a separate file + flush + atomic rename, but they forgot to remove the fsync call.

morotti avatar Dec 15 '24 14:12 morotti

but they forgot to remove the fsync call

You've asserted this a few times now, but what's your proof? You said yourself that this is your "best guess". But isn't it just as likely that the fsync was left in because it addressed some corner case (possibly on an obscure platform) that you're not aware of?

Your arguments seem to suggest that the fsync syscall is essentially useless. If that's the case, why does it even exist? It's clearly not a no-op, as it takes a non-trivial amount of time to execute.

While performance improvements are nice, we shouldn't make them at the expense of correctness. So the blocking question here is twofold:

  1. Do we have documentation explaining why the fsync call in this situation is unnecessary? It's not sufficient for you to just assert that's the case.
  2. Can we check for the conditions that make it unnecessary, and only omit the fsync in those cases? Note, for example, that on Windows, the os.fsync function calls the MS CRT function _commit. Neither of the articles you linked to above discuss Windows at all, so why do you assume the behaviour there is the same as Linux?

As a reminder - I originally said that I was OK with this change as long as the other maintainers didn't have concerns. It turns out the other maintainers do have concerns, and if you don't address them, this PR will remain blocked. I'm simply trying to clarify what the blocking concern is, but it's not me you have to persuade, it's @ichard26 and @pradyunsg

pfmoore avatar Dec 15 '24 15:12 pfmoore