Vadim Kantorov
Vadim Kantorov
Actually, I found that in Playground code inlinable `url("../assets/...")` are already used, so maybe worth just updating the CSS files in `./examples/*/`...
I wonder if we could design some explicit pack/unpack/load/store/index util methods that would be enough for basic usage (like numpy does with packbits/unpackbits) Maybe we could have some unpack method...
A simple interface could be packbits / unpackbits like in NumPy with additional bitness argument (to support 1-bit, 2-bit and 4-bit) and dim argument. It should maybe support out argument...
https://www.microsoft.com/en-us/research/uploads/prod/2018/02/KoPhiliposeTashevZarar_ICASSP_2018.pdf suggests that XNOR and POPCNT functionality is useful for 1-bit networks
Arguments that can be helpful for packbits/unpackbits: 1) `mask` - a bit mask integer, specifying pack/unpack compress mask like in [compress](https://www.officedaytime.com/simd512e/simdimg/compress.php?f=vpcompressq) [expand](https://www.officedaytime.com/simd512e/simdimg/expand.php?f=vexpandq) instructions -> this is slightly more flexible than...
in my code I'd do sth like: `torch.packbits(something.argmax(dim = -1), mask = 0b11, dim =-1, out = my_uint8_array[k])`
I think one can think about this feature request as surfacing compress/expand SIMD functionality to user land and reimplementing it on GPU
I thought that given a mask pack/unpack are precisely equivalent to SIMD compress/expand? Aren’t they?
General int4 support request in https://github.com/pytorch/pytorch/issues/33859 is also related
Another useful functionality would be scatter/gather-like functionality for compressing index tensors. In a practical usecase it can help to compress the hybrid sparse+dense tensor indices by a lot: https://discuss.pytorch.org/t/sparse-torch-topk/71832/4