Byte-based random access
I have not found the support forum. If there is one feel free to point me to it.
I have a binary file with variable length records. I can compute the start position of each record. How can I generate an index of the file based on these positions? How can I extract records 1000..1007,2000..2030?
Thanks for your question, this is indeed the best place to ask.
The XZ index is automatically created by pixz, and does include the data you need to do the kind of random access you want. However, the command-line interface for pixz currently only supports extracting files from tarballs, not generalized random access.
If you want to add such a feature, patches are welcome! Your code would use the lzma_index_iter_locate() function to seek to uncompressed locations within the XZ file.
Alternatively, you could use my project lzopfs: https://github.com/vasi/lzopfs . It allows you to use an XZ file generated by pixz as if it was uncompressed, including random access:
# Access blocks 1000-1007 of a compressed file
./lzopfs myfile.pxz mntpoint
dd if=mntpoint/myfile skip=1000 count=8 > output