khmer icon indicating copy to clipboard operation
khmer copied to clipboard

Speed ups from compiling with specific arch

Open betatim opened this issue 8 years ago • 5 comments

We should discuss how best to deal with the fact that compilers are getting smarter but you need to tell them what arch you are working with. For example https://godbolt.org/g/8EyZEJ counts the number of set bits which on a haswell (any not very old intel CPU) or newer results in a single instruction specifically made for this. Remove the -march=haswell to see the long form.

On my desktop compiling khmer with -march=skylake brings a few percent of speed up.

Not sure what the recommended arch is for binaries distributed via PyPI but I'd bet it isn't -march=haswell. So we can't just put it into setup.py.

Credit for making me think about this: https://www.youtube.com/watch?v=bSkpMdDe4g4 also mentions various other tricks.

betatim avatar Oct 12 '17 08:10 betatim

This is surprisingly difficult to do.... import platform; platform.platform() can tell you if your system feels like it; parsing /proc/cpuinfo can as well, once again, if your system feels like it. On my lab desktop, /proc/cpuinfo will tell me that I have an i7-3820, but not that it's a Sandy Bridge chip. On my Macbook, /usr/sbin/sysctl -e machdep.cpu will give me a bunch of numerical codes for model, family, etc, which can probably be translated, but aren't informative on their own. Best option is probably to let users pass in their own -march, but I don't know if that can be done with pip either...

On Thu, Oct 12, 2017 at 1:07 AM, Tim Head [email protected] wrote:

We should discuss how best to deal with the fact that compilers are getting smarter but you need to tell them what arch you are working with. For example https://godbolt.org/g/8EyZEJ counts the number of set bits which on a haswell (any not very old intel CPU) or newer results in a single instruction specifically made for this. Remove the -march=haswell to see the long form.

On my desktop compiling khmer with -march=skylake brings a few percent of speed up.

Not sure what the recommended arch is for binaries distributed via PyPI but I'd bet it isn't -march=haswell. So we can't just put it into setup.py .

Credit for making me think about this: https://www.youtube.com/watch? v=bSkpMdDe4g4 also mentions various other tricks.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dib-lab/khmer/issues/1799, or mute the thread https://github.com/notifications/unsubscribe-auth/ACwxrZ7M92SzHp_aawvjumVMkvM5J8zYks5srcirgaJpZM4P2npF .

-- Camille Scott

Graduate Group for Computer Science Lab for Data Intensive Biology University of California, Davis

[email protected]

camillescott avatar Oct 18 '17 04:10 camillescott

-march=native seems to do the right thing when testing on my laptop (super old no haswell) and my linux desktop.

Doesn't solve the question of what arch we should use when building binaries for others to use.

betatim avatar Oct 20 '17 19:10 betatim

You can try to follow what is being done in this lib: https://github.com/kimwalisch/libpopcnt (they detect at runtime what is available). Not sure how scalable the solution is for more instructions, and not even sure if it is a good idea (since we want to let the compiler take care of it), but I thought it was worth throwing this here.

On Fri, Oct 20, 2017 at 12:08 PM, Tim Head [email protected] wrote:

-march=native seems to do the right thing when testing on my laptop (super old no haswell) and my linux desktop.

Doesn't solve the question of what arch we should use when building binaries for others to use.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dib-lab/khmer/issues/1799#issuecomment-338296792, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAZ8p-LemeD9nxLZNO2R36mSDc4liQZks5suO-igaJpZM4P2npF .

luizirber avatar Oct 31 '17 18:10 luizirber

Can setup.py execute some minimal code at compile time to detect the architecture and adjust options accordingly.

This is getting into the realm of hairy limited-shelf-life-solutions, admittedly.

standage avatar Oct 31 '17 18:10 standage

Right now I think -march=native would be good enough for most people (and the speedups seem to be small anyway so not worth adding too much magic?). With maybe some if statements in setup.py to detect when we are building wheels/binaries for distribution in which case you turn it off/set it to what the recommended arch is.

betatim avatar Nov 01 '17 09:11 betatim