unipdf icon indicating copy to clipboard operation
unipdf copied to clipboard

Inspect method not picking up embedded JavaScript

Open mthomasuk opened this issue 5 years ago • 1 comments

Hey guys, really enjoying using unidoc, just spotted something (and apologies if this is an oversight)

The pdfReader.Inspect method doesn't appear to pull the JavaScript out of the following file:

%PDF-1.7
4 0 obj
<<
/Length 0
>>
stream

endstream endobj
5 0 obj
<<
/Type /Page
/Parent 2 0 R
/Resources 3 0 R
/Contents 4 0 R
/MediaBox [ 0 0 612 792 ]

>>
 endobj
1 0 obj
<<
/Type /Catalog
/Pages 2 0 R
/OpenAction [ 5 0 R /Fit ]
  /Names << % the Javascript entry
    /JavaScript <<
      /Names [
        (EmbeddedJS)
        <<
          /S /JavaScript
          /JS (
            app.alert('Hello, World!');
          )
        >>
      ]
    >>
  >> % end of the javascript entry
>>
 endobj
2 0 obj
<<
/Type /Pages
/Count 1
/Kids [ 5 0 R ]

>>
 endobj
3 0 obj
<<
>>
 endobj
xref
0 6
0000000000 65535 f 
0000000166 00000 n 
0000000244 00000 n 
0000000305 00000 n 
0000000009 00000 n 
0000000058 00000 n 
trailer <<
/Size 6
/Root 1 0 R
>>
startxref
327
%%EOF

It correctly identifies the number of pages and etc, just doesn't pick up the JS around line 25

Additional question: do you guys support stripping this kind of stuff from the file? The example does a great job of explaining how to find JS/Flash/Video in a PDF, but not how to remove (if possible)

Thanks!

mthomasuk avatar Sep 11 '18 14:09 mthomasuk

Thanks. The reason is that the Inspect method is currently not recursive, only looks at the objects at the first-level (looping through the indirect objects).

To make this work. We need to refactor the parser.inspect method to walk through the PDF object structure recursively.

gunnsth avatar Sep 12 '18 16:09 gunnsth