bowtie2
bowtie2 copied to clipboard
mapping performance on AMD much worse than on Intel CPUs
I am mapping paired end data with these parameters: --maxins 5000 --phred33-quals --threads 8 --no-discordant I have a couple of Debian servers to do that. Some of them are AMD Opteron(tm) Processor 6380 (4 year old) and some are Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz (2 year old)
Mapping on AMD is ~10x slower than on Intel. I excluded memory or storage issues. I am using Debian binaries, version 2.3.4.3.
What could be the reason for such a drastic difference? Are there some specific CPU optimisations which can be chosen at compile time? I can see that gcc flags are -msse2 -O3 -m64 -funroll-loops -g3
Oddly enough I was talking about this today.
Unfortunately it looks like they've opted to use the Intel-specific POPCNT instruction, as opposed to the platform agnostic one (I want to suggest something like__builtin_popcount()
off the top of my head, but I've not touched C++ in for years).
Edit: Had a little bit of a think and I'm not sure if I'm right. I understand that the current headers used (Intel's SEE 4.2) should compile down to the same exact code as the standard GCC extension - so I doubt it's that.
We may some changes to the bt2_cxx11
branch to always use the compiler's popcount implementation. Can you try test if this results in better performance?
I'm currently away from my office, but I can benchmark this when I get back.
Hello,
Have you gotten a chance to benchmark the changes?
Would be great to get a benchmarking script contributed to this repository!