KMC icon indicating copy to clipboard operation
KMC copied to clipboard

Extend max supported counter vaule

Open marekkokot opened this issue 6 years ago • 0 comments

Currenlty kmc (except for small k) and kmc_tools store counters of kmers on uint32_t, which is reasonable for k-mer counters, but the counter field in kmc_database may be also used in other purposes (like #58). In such a case a longer variable (uint64_t or maybe even more) could be helpful. The suggestion from notestaff is worth considering as an alternative.

One other option is if you could natively support treating counters as indices into a lookup table where actual values are stored. Then operations on counters could be turned into operations on the referenced values. You'd keep a hash-based map from value to index in the table, so that if an operation produces a value not yet seen, it's added to the table. Then counters could represent sets of samples, with set operations for combining the values; but also any other type that doesn't fit into 32 bits, as long as the actual number of different values seen is under 2^32 (true for sets of samples).

marekkokot avatar May 24 '18 09:05 marekkokot