ntHash
ntHash copied to clipboard
AVX2 and AVX512 implementations for ntHash
Since the recurrent version of ntHash is a prefix sum with an associative operator, it is possible to compute the hashes in parallel using the prefix scan algorithm. For 64-bit hash the theoretical speedup in 2x for AVX2 and 3x for AVX512. Versions for 32-bit hashes are also included featuring even greater speedup.
Hi, @mirounga. Cool! Does this code implement ntHash v1 or ntHash v2?
The code implements the current version of ntHash. I have used the scalar version for validation
Great. Thanks for confirming, @mirounga.
Hi @mirounga
I'm getting the following runtime performance results:
[hmohamadi@hpcg01 ntHash_avx]$ ./nttest_avx -k50 ERR294494_1.part1.fastq
CPU time (sec) for hash algorithms for kmer=50
nthash ntavx2 ntavx232 ntavx512 ntavx532
6.72 2.14 2.25 6.84 6.98
The proper AVX512 and AVX512x32 implementations are not included in nttest_avx.cpp
:
void hashSeqAvx512(const string & seq) {
uint64_t fhVal, rhVal, hVal;
hVal = NTC64(seq.c_str(), opt::kmerLen, fhVal, rhVal);
if (hVal)opt::nz++;
for (size_t i = 1; i < seq.length() - opt::kmerLen + 1; i++) {
hVal = NTC64(seq[i - 1], seq[i - 1 + opt::kmerLen], opt::kmerLen, fhVal, rhVal);
if (hVal)opt::nz++;
}
}
void hashSeqAvx512x32(const string & seq) {
uint64_t fhVal, rhVal, hVal;
hVal = NTC64(seq.c_str(), opt::kmerLen, fhVal, rhVal);
if (hVal)opt::nz++;
for (size_t i = 1; i < seq.length() - opt::kmerLen + 1; i++) {
hVal = NTC64(seq[i - 1], seq[i - 1 + opt::kmerLen], opt::kmerLen, fhVal, rhVal);
if (hVal)opt::nz++;
}
}
Hi @mohamadi , fixed the AVX512 functions. When measuring performance please subtract the cost of IO - it is dominating now.
Closing due to being related to ntHash's old codebase. New ports are welcome!
Nice work there! I've expanded this unmerged PR into its own repository, in order to better assess the correctness of results, (keeping the same codebase): https://github.com/rchikhi/ntHash-AVX512
@parham-k can you take a look at it and merge if compatible with the new release?
Sure! Thank you @rchikhi.