Halide
Halide copied to clipboard
Add minimal useful implementation of extracting and concatenating bits
This is a minimum viable zero-cost way to load vectors of one type as larger or smaller vectors of a narrower or wider type respectively. It does this using two new intrinsics: concat_bits, and extract_bits.
Interleaves of extract_bits calls can simplify to simply loading a wider vector and then reinterpreting it, and concatenating the bits of a narrow strided load can simplify to doing a larger dense load and reinterpreting it. There are probably other cases where we can do some clever simplification. More test-cases welcome.
This PR doesn't contain any syntactic sugar for reinterpreting Func types. For the user there's just a fixed idiom one can use to get the desired effect.