php-archive
php-archive copied to clipboard
PAX typeFlag 'x'
I have encountered an issue when adding filename in format "._4слайд-150x150.jpg" , the linux tar utility would mark them with typeFlag x, which is similar to the LongLink typeFlag L. This breaks the archive extraction and the generated error would be "Header does not match it's checksum for"
Since the TAR class supports ustar format, it seems it's bound to support pax type as file, so a quick way for fixing this would be to replace this code
// Handle Long-Link entries from GNU Tar
if ($return['typeflag'] == 'L' ) {
// following data block(s) is the filename
$filename = trim($this->readbytes(ceil($header['size'] / 512) * 512));
// next block is the real header
$block = $this->readbytes(512);
$return = $this->parseHeader($block);
// overwrite the filename
$return['filename'] = $filename;
}
with
// Handle Long-Link entries from GNU Tar
if ($return['typeflag'] == 'L' || $return['typeflag'] == 'x') {
// following data block(s) is the filename
$filename = trim($this->readbytes(ceil($header['size'] / 512) * 512));
// next block is the real header
$block = $this->readbytes(512);
$return = $this->parseHeader($block);
// overwrite the filename
if($return['typeflag'] == 'L')
{
$return['filename'] = $filename;
}
}
in the protected function parseHeader($block)
I have tested this and it works fine from processing records with typeFlag x , should i do a pull request?
I am attaching as well the tgz archive i've used for testing test.tgz.zip
Interesting. Would be good to have some pointer to the documentation on what the difference between L
and x
is. There might be additional stuff that needs to be done for x
types.
Please open a pull request and include a test case.
The PAX header is defined here https://www.gnu.org/software/tar/manual/html_node/Standard.html XHDTYPE
I did find a Python implementation for the tar utility here https://svn.python.org/projects/python/tags/r31/Lib/tarfile.py , check the
def create_pax_header(self, info):
method, basically it checks if the filename contains non-ASCII characters and if it does, it will create a PAX header with. I will look more into it as well
Pull request is here https://github.com/splitbrain/php-archive/pull/19
I did find some more details here explaining the pax extended headers https://www.ibm.com/support/knowledgecenter/SSLTBW_1.13.0/com.ibm.zos.r13.bpxa500/pxarchfm.htm#paxex
If I may add 2 cents to this discussion: It seems that there may be also g
blocks that contain globally applicable pax data... :grimacing:
The posted IBM link is dead, besides, here is a working one: https://www.ibm.com/docs/en/zos/2.4.0?topic=SSLTBW_2.4.0/com.ibm.zos.v2r4.bpxa500/paxhead.htm#paxhead
We would have a real-world need for pax support, besides: https://github.com/dennis-eisen/CT_AutoUpdater/issues/8 However, I totally understand how limited resources are in FOSS projects like this one here, and I don't have the time resources to do it myself, unfortunately. :(