php-html-parser icon indicating copy to clipboard operation
php-html-parser copied to clipboard

Invalid internal use of preg_match_alll()

Open chaslain opened this issue 2 years ago • 2 comments

PHP Warning 'yii\base\ErrorException' with message 'preg_match_all(): Compilation failed: invalid range in character class at offset 4'

in vendor/paquettg/php-html-parser/src/PHPHtmlParser/Selector.php:91

Code was this:

$file = file_get_contents($this->file_path);
$dom = new Dom;
$dom->loadStr($file, []);

$rows = $dom->find("tr");


php version: PHP 7.3.33 (cli) (built: Mar 18 2022 03:41:41) ( NTS ) Package version: 1.7.0

chaslain avatar Mar 06 '23 20:03 chaslain

Same error if using provided method of loading from file instead.

chaslain avatar Mar 06 '23 20:03 chaslain

Same error trying to use it as follow:


require "../vendor/autoload.php";
use PHPHtmlParser\Dom;
$url = "https://google.com";
$dom = new Dom;
$dom->loadFromUrl($url);

Interestingly, I place a var_dump just before that line so get you some more details: var_dump($this->pattern, $selector);

Result:

string(103) "/([\w-:\*>]*)(?:\#([\w-]+)|\.([\w-]+))?(?:\[@?(!?[\w-:]+)(?:([!*^$]?=)["']?(.*?)["']?)?\])?([\/, ]+)/is"
string(29) "meta[http-equiv=Content-Type]"
<br />
<b>Warning</b>:  preg_match_all(): Compilation failed: invalid range in character class at offset 4 in <b>/home/fmaz878/vendor/paquettg/php-html-parser/src/PHPHtmlParser/Selector.php</b> on line <b>92</b><br />

Note that without the var_dump the error is on line 91.

Specifically, what is wrong is using "-" after \w, which tries to create a range, but fail to follow proper syntax. I won't pretend to understand the purpose of that regexp, but escaping the dash seems to resolve that specific issue (and create a different error. /([\w\-:\*>]*)(?:\#([\w\-]+)|\.([\w\-]+))?(?:\[@?(!?[\w\-:]+)(?:([!*^$]?=)["']?(.*?)["']?)?\])?([\/, ]+)/is

FMaz008 avatar Jan 20 '24 19:01 FMaz008