phpdox icon indicating copy to clipboard operation
phpdox copied to clipboard

Problems with pseudo url encoded directories

Open f-hofmann opened this issue 9 years ago • 2 comments

It seems phpdox (or DOMDocument) is having some difficulties reading the config file from a directory that contains some pseudo url encoded parts;

How to reproduce

I've created a simple phpdox.xml file:

<?xml version="1.0" encoding="utf-8" ?>
<phpdox xmlns="http://xml.phpdox.net/config">
    <project name="Example" source="SOME_DESCRIPTION_WHICH_CONF_FILE_WAS_READ" workdir="build/api/xml">
        <collector backend="parser"/>
        <generator output="build/api">
            <build engine="html" output="html"/>
        </generator>
    </project>
</phpdox>

I've prepared two empty directories:

  • /tmp/test%2Fdir/
  • /tmp/testdir/

Each of these directories has a copy of the phpdox.xml file, the only difference is, that each copy points to a different source; The source used does not exist, it is just for distinguishing each config file.

Expected behaviour

In each directory I should be able to run phpdox. As there is no source folder, I would expect an error message, telling me that the specified source-folder does not exist.

Actual behaviour

cd /tmp/testdir/ && phpdox -f phpdox.xml
> phpDox 0.8.1.1 - Copyright (C) 2010 - 2016 by Arne Blankerts
> 
> [20.06.2016 - 15:25:56] Using config file 'phpdox.xml'
> ...
> [20.06.2016 - 15:25:56] Starting collector
> 
> An application error occured while processing:
> 
>   Invalid src directory "phpdox.xml_from_testdir" specified
> 
> Please verify your configuration.

This is the expected behaviour, now repeating that in the directory with the pseudo url encoded directory:

cd /tmp/test%2Fdir/ && phpdox -f phpdox.xml
> phpDox 0.8.1.1 - Copyright (C) 2010 - 2016 by Arne Blankerts
> 
> 
> An error occured while trying to load the configuration file:
> 
> Parsing config file 'phpdox.xml' failed.

To verify my assumption about the url encoded directory, create a third directory

mkdir -p /tmp/test/dir
# put the same phpdox.xml in there, just alter the source to some distinguishable, e.g. "phpdox_from_test/dir"
cd /tmp/test%2Fdir/ && phpdox -f phpdox.xml
> phpDox 0.8.1.1 - Copyright (C) 2010 - 2016 by Arne Blankerts
> 
> [20.06.2016 - 15:32:44] Using config file 'phpdox.xml'
> ...
> [20.06.2016 - 15:32:44] Starting collector
> 
> An application error occured while processing:
> 
>   Invalid src directory "phpdox.xml_from_test/dir" specified
> 
> Please verify your configuration.   

I've also checked your code and it seems the root "cause" is libxml, which is trying to decode that. I've tried to encode the pseudo part %2F to %252F, but this breaks your FileInfo-checks.

We are experiencing those problems on an older debian 7 running php5.6 and an ubuntu 16.04 running php5.6.

  • libxml2: 2.9.3
  • php5.6.22

thanks & best felix

f-hofmann avatar Jun 20 '16 15:06 f-hofmann

Thanks for reporting and debugging this.

I have to admit, I never considered to URLencode a "/" to get it into a directory name. Apart from the fact phpDox shouldn't have issues with it though, I fail to see the motivation for this attempt: There won't be any code anywhere decoding it for output purposes so there will always be %F in the HTML markup... So what's the point?

The fact LibXML is decoding it arguably makes sense: A File path can be interpreted as an URI and thus can (and probably has to) be decoded. Since PHP doesn't do that for file accesses, that's where your problem starts.

I'm not sure there is an easy fix for that. Running URL-decode on all php level file system calls doesn't feel like a good solution...

theseer avatar Jun 20 '16 22:06 theseer

We came across this issue during some changes we did in our JenkinsCI infrastructure; We started to use the multibranch-plugin in jenkins which is encoding it this way:

Branch names get encoded (i.e. a forward slash '/' becomes '%2F'), so some things may not work without additional configuration. [...]

After switching to the multi branch plugin, nearly all other steps like phpunit, phpcs, md ... failed - literally everything that depends somehow on php based filesystem functions and libxml.

Although this specific example is using an encoded "/", the issue is also valid for other "encoded" strings. As you said, it is a requirement from libxml to only process properly encoded URIs. Therefore it's obvious that only properly encoded URIs should be passed to libxml, no matter if the URI references a local file or a remote resource.

Using the file://-scheme does not work, it seems php running on *nix and libxml are expecting two different things:

  • php: file://-URIs do not support percent encoding
  • libxml: thumbs up for percent encoding

WIth this in mind the ConfigLoader could take care of the expected encoding from libxml/(f)DOMDocument:

An ugly test with

# ConfigLoader:122
$fparts = explode(DIRECTORY_SEPARATOR, $fname);
$fparts = array_map("rawurlencode", $fparts);
$dom->load(join(DIRECTORY_SEPARATOR, $fparts));

lead to the expected result, but showed another problem in the php-dom extension:

Relative paths passed to DOMDocument::load() are automatically completed with the cwd, without encoding it, but passing the properly encoded full path to DOMDocument::load() works. Remember, properly encoded means percent-encoded and therefore not accessible via php... :(

Although rare, I still think that there is a need for a proper encoding of the URIs passed to (f)DOMDocument, but I'm not sure if phpdox is the right place.

Lesson learned: stick to [a-zA-Z0-9]* for file and directory names ;)

f-hofmann avatar Jun 21 '16 09:06 f-hofmann