ntHash icon indicating copy to clipboard operation
ntHash copied to clipboard

AVX2 and AVX512 implementations for ntHash

Open mirounga opened this issue 5 years ago • 9 comments

Since the recurrent version of ntHash is a prefix sum with an associative operator, it is possible to compute the hashes in parallel using the prefix scan algorithm. For 64-bit hash the theoretical speedup in 2x for AVX2 and 3x for AVX512. Versions for 32-bit hashes are also included featuring even greater speedup.

mirounga avatar Nov 25 '18 19:11 mirounga

Hi, @mirounga. Cool! Does this code implement ntHash v1 or ntHash v2?

sjackman avatar Nov 26 '18 17:11 sjackman

The code implements the current version of ntHash. I have used the scalar version for validation

mirounga avatar Nov 27 '18 02:11 mirounga

Great. Thanks for confirming, @mirounga.

sjackman avatar Nov 27 '18 02:11 sjackman

Hi @mirounga

I'm getting the following runtime performance results:

[hmohamadi@hpcg01 ntHash_avx]$ ./nttest_avx -k50 ERR294494_1.part1.fastq 
CPU time (sec) for hash algorithms for kmer=50
nthash	ntavx2	ntavx232	ntavx512	ntavx532	
6.72	2.14	2.25	6.84	6.98

The proper AVX512 and AVX512x32 implementations are not included in nttest_avx.cpp:

void hashSeqAvx512(const string & seq) {
        uint64_t fhVal, rhVal, hVal;
        hVal = NTC64(seq.c_str(), opt::kmerLen, fhVal, rhVal);
        if (hVal)opt::nz++;
        for (size_t i = 1; i < seq.length() - opt::kmerLen + 1; i++) {
                hVal = NTC64(seq[i - 1], seq[i - 1 + opt::kmerLen], opt::kmerLen, fhVal, rhVal);
                if (hVal)opt::nz++;
        }
}

void hashSeqAvx512x32(const string & seq) {
        uint64_t fhVal, rhVal, hVal;
        hVal = NTC64(seq.c_str(), opt::kmerLen, fhVal, rhVal);
        if (hVal)opt::nz++;
        for (size_t i = 1; i < seq.length() - opt::kmerLen + 1; i++) {
                hVal = NTC64(seq[i - 1], seq[i - 1 + opt::kmerLen], opt::kmerLen, fhVal, rhVal);
                if (hVal)opt::nz++;
        }
}

mohamadi avatar Nov 28 '18 18:11 mohamadi

Hi @mohamadi , fixed the AVX512 functions. When measuring performance please subtract the cost of IO - it is dominating now.

mirounga avatar Nov 29 '18 20:11 mirounga

Closing due to being related to ntHash's old codebase. New ports are welcome!

parham-k avatar Sep 03 '22 16:09 parham-k

Nice work there! I've expanded this unmerged PR into its own repository, in order to better assess the correctness of results, (keeping the same codebase): https://github.com/rchikhi/ntHash-AVX512

rchikhi avatar Oct 08 '22 11:10 rchikhi

@parham-k can you take a look at it and merge if compatible with the new release?

mohamadi avatar Oct 11 '22 15:10 mohamadi

Sure! Thank you @rchikhi.

parham-k avatar Oct 11 '22 22:10 parham-k