tsinfer icon indicating copy to clipboard operation
tsinfer copied to clipboard

Speed up bit unpacking in `generate_ancestors`

Open benjeffery opened this issue 1 year ago • 1 comments

There are a few ways to go about this for example some CPUs have specific instrcutions for this. After some research the most portable and robust way appears to be multiplying out the bitpacked value such that the bits get put in the right place in a 64bit word, something like:

void unpackbits(const uint8_t *restrict source, size_t len, int8_t *restrict dest) {
    uint64_t MAGIC = 0x8040201008040201ULL;
    uint64_t MASK  = 0x8080808080808080ULL;
    size_t dest_index = 0;
    for (size_t i = 0; i < len; i++) {
        uint64_t t = ((MAGIC*source[i]) & MASK) >> 7;
        *(uint64_t*)&dest[dest_index] = t;
        dest_index += 8;
    }
}

Care needs to be taken, for example, that the length of the dest array is a multiple of 8. This results in a ~25% speed up for 1kg.

benjeffery avatar Mar 29 '23 12:03 benjeffery