pdfparser icon indicating copy to clipboard operation
pdfparser copied to clipboard

Fix for adjacent escaped slashes and escaped parentheses in strings

Open GreyWyvern opened this issue 3 months ago • 1 comments

Type of pull request

  • [X] Bug fix (involves code and configuration changes)

About

The current (string) replacement regexp in formatContent() only backchecked two characters for escaped slashes, so if an escaped slash immediately preceded an escaped parenthesis, the script would incorrectly interpret it as an escaped slash and an unescaped parenthesis. This would lead to the loop never finding the "end" of the string (for an open parenthesis) or finding the end of the string prematurely (for a close parenthesis).

Perform a string replace to get rid of all escaped slashes and then escaped parentheses; they aren't needed when just checking for balanced, unescaped parentheses. Also add removing slashes to the inline images section above for the same reason.

Resolves #709.

Checklist for code / configuration changes

In case you changed the code/configuration, please read each of the following checkboxes as they contain valuable information:

  • [X] Please add at least one test case (unit test, system test, ...) to demonstrate that the change is working. If existing code was changed, your tests cover these code parts as well.
  • [X] Please run PHP-CS-Fixer before committing, to confirm with our coding styles. See https://github.com/smalot/pdfparser/blob/master/.php-cs-fixer.php for more information about our coding styles.
  • [X] In case you fix an existing issue, please do one of the following:
    • [X] Write in this text something like fixes #1234 to outline that you are providing a fix for the issue #1234.

GreyWyvern avatar May 13 '24 18:05 GreyWyvern

@huihuangjiuai Does this fix #709 for you?

k00ni avatar May 14 '24 06:05 k00ni