bowtie2 icon indicating copy to clipboard operation
bowtie2 copied to clipboard

mapping performance on AMD much worse than on Intel CPUs

Open balwierz opened this issue 5 years ago • 5 comments

I am mapping paired end data with these parameters: --maxins 5000 --phred33-quals --threads 8 --no-discordant I have a couple of Debian servers to do that. Some of them are AMD Opteron(tm) Processor 6380 (4 year old) and some are Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz (2 year old)

Mapping on AMD is ~10x slower than on Intel. I excluded memory or storage issues. I am using Debian binaries, version 2.3.4.3.

What could be the reason for such a drastic difference? Are there some specific CPU optimisations which can be chosen at compile time? I can see that gcc flags are -msse2 -O3 -m64 -funroll-loops -g3

balwierz avatar Apr 15 '19 12:04 balwierz

Oddly enough I was talking about this today.

Unfortunately it looks like they've opted to use the Intel-specific POPCNT instruction, as opposed to the platform agnostic one (I want to suggest something like__builtin_popcount() off the top of my head, but I've not touched C++ in for years).

Edit: Had a little bit of a think and I'm not sure if I'm right. I understand that the current headers used (Intel's SEE 4.2) should compile down to the same exact code as the standard GCC extension - so I doubt it's that.

KeironO avatar Aug 29 '19 23:08 KeironO

We may some changes to the bt2_cxx11 branch to always use the compiler's popcount implementation. Can you try test if this results in better performance?

ch4rr0 avatar Nov 04 '19 20:11 ch4rr0

I'm currently away from my office, but I can benchmark this when I get back.

KeironO avatar Nov 06 '19 17:11 KeironO

Hello,

Have you gotten a chance to benchmark the changes?

ch4rr0 avatar Nov 26 '19 14:11 ch4rr0

Would be great to get a benchmarking script contributed to this repository!

mr-c avatar Dec 19 '19 17:12 mr-c