silx
silx copied to clipboard
Provide Bitshuffle-LZ4 codec in OpenCL
Similar to what is done for byte-offset algorithm.
- [x] Decompression
- [ ] Compression
Design of the compressor: Since LZ4 is coupled with bitshuffle in our target application we focus on the repetition of pattern of size 1 byte.
Each workgroup works on 16k of data, this matches the cache size used on CPU. This makes 256 groups for a 4M image. Collective work: load 16k (check) of data into shared memory.
Search-repetition: Each thread starts at its position and checks how many times its value is duplicated in the forwards input stream. Store the number of repetition in shared memory. Decrement shared counter to determine when all threads have finished their analysis.
One thread analyzes the repetition ... when rep-length increases sharply (>5): -Store all data up to that point as litterals (collective copy) -Tell how many times to repeat the last character of the litteral section.
Restart the search repetition where it ended previously until the 16k are consumed.
Write how many output bytes are needed into a buffer.
There should be an atomic decrement of a shared buffer determine the last workgroup to finish. This one performs the cumsum for the start position in the compacted buffer.
Run a final kernel which compacts the different buffers