php-archive icon indicating copy to clipboard operation
php-archive copied to clipboard

PAX typeFlag 'x'

Open ovidiul opened this issue 7 years ago • 6 comments

I have encountered an issue when adding filename in format "._4слайд-150x150.jpg" , the linux tar utility would mark them with typeFlag x, which is similar to the LongLink typeFlag L. This breaks the archive extraction and the generated error would be "Header does not match it's checksum for"

Since the TAR class supports ustar format, it seems it's bound to support pax type as file, so a quick way for fixing this would be to replace this code

// Handle Long-Link entries from GNU Tar
        if ($return['typeflag'] == 'L' ) {
            // following data block(s) is the filename
            $filename = trim($this->readbytes(ceil($header['size'] / 512) * 512));
            // next block is the real header
            $block  = $this->readbytes(512);
            $return = $this->parseHeader($block);

            // overwrite the filename
		$return['filename'] = $filename;
        }

with

// Handle Long-Link entries from GNU Tar
        if ($return['typeflag'] == 'L' || $return['typeflag'] == 'x') {
            // following data block(s) is the filename
            $filename = trim($this->readbytes(ceil($header['size'] / 512) * 512));
            // next block is the real header
            $block  = $this->readbytes(512);
            $return = $this->parseHeader($block);

            // overwrite the filename
            if($return['typeflag'] == 'L')
            {
				$return['filename'] = $filename;
			}
        }

in the protected function parseHeader($block)

I have tested this and it works fine from processing records with typeFlag x , should i do a pull request?

I am attaching as well the tgz archive i've used for testing test.tgz.zip

ovidiul avatar Mar 13 '17 07:03 ovidiul

Interesting. Would be good to have some pointer to the documentation on what the difference between L and x is. There might be additional stuff that needs to be done for x types.

Please open a pull request and include a test case.

splitbrain avatar Mar 13 '17 10:03 splitbrain

The PAX header is defined here https://www.gnu.org/software/tar/manual/html_node/Standard.html XHDTYPE

I did find a Python implementation for the tar utility here https://svn.python.org/projects/python/tags/r31/Lib/tarfile.py , check the

def create_pax_header(self, info):

method, basically it checks if the filename contains non-ASCII characters and if it does, it will create a PAX header with. I will look more into it as well

ovidiul avatar Mar 13 '17 10:03 ovidiul

Pull request is here https://github.com/splitbrain/php-archive/pull/19

ovidiul avatar Mar 13 '17 13:03 ovidiul

I did find some more details here explaining the pax extended headers https://www.ibm.com/support/knowledgecenter/SSLTBW_1.13.0/com.ibm.zos.r13.bpxa500/pxarchfm.htm#paxex

ovidiul avatar Mar 13 '17 17:03 ovidiul

If I may add 2 cents to this discussion: It seems that there may be also g blocks that contain globally applicable pax data... :grimacing: The posted IBM link is dead, besides, here is a working one: https://www.ibm.com/docs/en/zos/2.4.0?topic=SSLTBW_2.4.0/com.ibm.zos.v2r4.bpxa500/paxhead.htm#paxhead

milux avatar Aug 31 '22 10:08 milux

We would have a real-world need for pax support, besides: https://github.com/dennis-eisen/CT_AutoUpdater/issues/8 However, I totally understand how limited resources are in FOSS projects like this one here, and I don't have the time resources to do it myself, unfortunately. :(

milux avatar Aug 31 '22 11:08 milux