picard icon indicating copy to clipboard operation
picard copied to clipboard

Cythonize picard

Open Gabrielcarvfer opened this issue 3 years ago • 10 comments

Summary

  • This is a…
    • [ ] Bug fix
    • [ ] Feature addition
    • [ ] Refactoring
    • [ ] Minor / simple change (like a typo)
    • [x] Other
  • Describe this change in 1-2 sentences: Make performance profiling easier by having debugging symbols that can be traced by perf/AMD uProf/Intel VTune/etc.

Problem

  • JIRA ticket (optional): PICARD-XXX

Profiling Picard is quite hard as most of Python profilers can't see methods invoked by Qt/C. All of them end up mashed together in the exec_ or self (for objects inheriting Qt classes).

Solution

Automatically find Picard sources that can be compiled and use their compiled versions if Cython is available.

Action

Gabrielcarvfer avatar Mar 28 '21 13:03 Gabrielcarvfer

The "cythonization" of tagger, file and metadatabox (second commit) is just an example of changes that needed to be made to compile the sources.

Examples of interesting stuff that you can see by doing this:

CPU load of format (a.k.a. decoding mp3 with mutagen)

image

CPU load of coverart processing

image

Branches and cache misses during selection of multiple files (how I found the issue in #1770)

image

image

Measuring with pyinstrument for 1k files, mostly in albums

image

Gabrielcarvfer avatar Mar 28 '21 13:03 Gabrielcarvfer

This is an interesting PR. But as far as I can see, it will cythonize if it possibly can.

Given the significance of this, can I suggest that you run it only if a command line parameter is added - "-c" perhaps?

Sophist-UK avatar Mar 28 '21 13:03 Sophist-UK

Sure, I was thinking about that. Not sure how many people run it from source and build it though.

Gabrielcarvfer avatar Mar 28 '21 13:03 Gabrielcarvfer

What's the reason to factor out all those _() calls? This'll get horrible if we need to apply it to more of the code base.

Can we find a way to make _() and related functions available to the cythonized modules instead?

phw avatar Mar 28 '21 14:03 phw

What's the reason to factor out all those _() calls? This'll get horrible if we need to apply it to more of the code base.

image

Can we find a way to make _() and related functions available to the cythonized modules instead?

Yup, looking into it.

Gabrielcarvfer avatar Mar 28 '21 14:03 Gabrielcarvfer

Putting the following in picard/__init__.py and excluding a few specific sources did the trick.

import builtins
from gettext import ngettext
builtins.__dict__['N_'] = lambda a: a
builtins.__dict__['ngettext'] = ngettext

Excluded files:

./picard/i18n.py => pgettext_attributes
./picard/ui/options/releases.py => gettext_countries
./picard/coverart/utils.py => pgettext_attributes

Gabrielcarvfer avatar Mar 28 '21 14:03 Gabrielcarvfer

Hmm, very weird. Now Picard can only have a single window (clicking to open the settings window actually show up on the main window).

EDIT: Weird. It did the same without compilation. I think I never noticed the settings showed up on the main window.

Gabrielcarvfer avatar Mar 28 '21 14:03 Gabrielcarvfer

Hmm, very interesting stuff. It turns out the lock in DataHash.init is held during the md5 calculation (which can take some time) + time to write the temporary file (which can take a very long time, since we're also reading files at the same time).

Click to show
_datafile_mutex.lock()
try:
    m = md5()
    m.update(data)
    self._hash = m.hexdigest()
    if self._hash not in _datafiles:
        (fd, self._filename) = tempfile.mkstemp(prefix=prefix, suffix=suffix)
        QObject.tagger.register_cleanup(self.delete_file)
        with os.fdopen(fd, "wb") as imagefile:
            imagefile.write(data)
            _datafiles[self._hash] = self._filename
            log.debug("Saving image data %s to %r" % (self._hash, self._filename))
    else:
        self._filename = _datafiles[self._hash]
finally:
    _datafile_mutex.unlock()

After a quick look, I think there's no need to lock before the md5 nor to keep the lock while writing the file. Taking the MD5 out of the lock block and splitting it into two locks reduces mean time to process from 130ms to 20ms per cover.

Click to show
m = md5()
m.update(data)
self._hash = m.hexdigest()
_datafile_mutex.lock()
if self._hash not in _datafiles:
    _datafile_mutex.unlock()
    (fd, self._filename) = tempfile.mkstemp(prefix=prefix, suffix=suffix)
    QObject.tagger.register_cleanup(self.delete_file)
    with os.fdopen(fd, "wb") as imagefile:
        imagefile.write(data)

    log.debug("Saving image data %s to %r" % (self._hash, self._filename))
    _datafile_mutex.lock()
    _datafiles[self._hash] = self._filename
else:
    self._filename = _datafiles[self._hash]
_datafile_mutex.unlock()

Probability density (sums to 100%) image

Gabrielcarvfer avatar Mar 29 '21 04:03 Gabrielcarvfer

Sorry, did not have the time yet to look at this too closely. But I searched my local branches and found the one where I also played with cythonizing stuff:

https://github.com/phw/picard/commit/c4941ec9e52505f8c3ef340648c8f45680ea8d49

In this I only did it for a few selected modules for which I thought there would be a performance benefit. I also tackled the _N issue a bit different by explicitly defining it in the module using it. Not sure which is better. The important part here is I think that the actually global gets set when picard.i18n gets set up.

My concern with this patch was performance. And indeed the individual modules became much faster. The overall impact on Picard however was not that noticable. I still think it might be a good idea to do this anyway, just did not dare to officially include this yet :)

Anyway, as I understand your concern with this PR is more about the profiling and getting actual symbols to help identifying the relevant code paths.

For me the question is:

  • Do we have any actual downside of cythonizing everything (if possible)?
  • Do we have any benefit of cythonizing everything (if possible)?
  • Should we cythonize everything in official builds?

In any case I think it should be optional during build. Both your and mine patch did this by actually only cythonize if cython is available. But actually I would be in favor of a flag as proposed by @Sophist-UK .

phw avatar Mar 29 '21 13:03 phw

My concern with this patch was performance. And indeed the individual modules became much faster. The overall impact on Picard however was not that noticeable.

We will never see gains like those from PICARD-1843 and PICARD-1844 again. Overall impact is basically null because locking and IO prevent the CPU from processing, and if the CPU process indiscriminately, then the interface freezes (this is why I didn't open a PR for the double locking I've shown).

Anyway, as I understand your concern with this PR is more about the profiling and getting actual symbols to help identifying the relevant code paths.

For me the question is:

  • Do we have any actual downside of cythonizing everything (if possible)?

Build size (~500MB). The mainwindow.py binary is the biggest with 40MB. Itemviews.py comes second with 13MB. Now that the N_ issue got solved, I think it is possible to compile every source into a single big chungus library. I'm going to test if it can trim some of that size down.

  • Do we have any benefit of cythonizing everything (if possible)?

Debugging symbols.

  • Should we cythonize everything in official builds?

No opinions on this. It takes quite some time to compile everything, and cython is pretty verbose. At the same time, some modules have speedups of 2-4x without any additional changes.

In any case I think it should be optional during build. Both your and mine patch did this by actually only cythonize if cython is available. But actually I would be in favor of a flag as proposed by @Sophist-UK .

I've included a '-cythonize' option. '-c' is reserved.

Gabrielcarvfer avatar Mar 29 '21 14:03 Gabrielcarvfer

After a short discussion with zas I'm going to close this for now. This was valuable to explore the results, but the minor gains are not worth the extended maintenance and support requirements.

If we would do something like this in the future I would be more in favor of a more minimal approach of only cythonizing selected modules. But even that showed little performance benefits overall. See also my comment above.

phw avatar Jun 11 '23 16:06 phw