mutagen icon indicating copy to clipboard operation
mutagen copied to clipboard

Performance on NFS-mounted files much helped by specifying buffering

Open medoc92 opened this issue 1 year ago • 1 comments

This is probably not a mutagen issue, but something which may be of interest anyway. I did not try to reproduce the thing in other contexts, so it may be quite specific. While doing mass tags extraction from an NFS-mounted file system, specifying buffering=4096 to the open() call in _utils.py yields a massive performance improvement (around 5x in my configuration).

Details:

  • Client system: "Ubuntu 22.04.1 LTS" Linux 5.15.0-56-generic Python 3.10.6
  • NFS server: Odroid hc4 : ARM running "Ubuntu 22.04.1 LTS" Linux 5.19.17-meson64
  • The volume is a 4TB spinning disk on the ARM system.

Without the buffering parameter, extracting tags from 3000 FLAC and MP3 files takes around 100 mS per file. With the buffering argument we get down to around 22 mS

I also did a quick test on a local SSD, on which the buffering does not appear to make a difference one way or another.

Tests done while trying to determine why recoll was slow indexing NFS-mounted audio files. The workaround for the application is to open the file with a buffering argument, before building the mutagen object.

This appears to be actually a Python bug, as from the Python manual open() doc:

Binary files are buffered in fixed-size chunks; the size of the buffer is chosen using a heuristic trying 
to determine the underlying device’s “block size” and falling back on
 [io.DEFAULT_BUFFER_SIZE](https://docs.python.org/3/library/io.html#io.DEFAULT_BUFFER_SIZE). 
On many systems, the buffer will typically be 4096 or 8192 bytes long.

So specifying buffering=4096 should be close to a no-op, and doing it as a precautionary default in mutagen should be inocuous enough.

medoc92 avatar Dec 24 '22 17:12 medoc92