unipdf
unipdf copied to clipboard
Inspect method not picking up embedded JavaScript
Hey guys, really enjoying using unidoc, just spotted something (and apologies if this is an oversight)
The pdfReader.Inspect
method doesn't appear to pull the JavaScript out of the following file:
%PDF-1.7
4 0 obj
<<
/Length 0
>>
stream
endstream endobj
5 0 obj
<<
/Type /Page
/Parent 2 0 R
/Resources 3 0 R
/Contents 4 0 R
/MediaBox [ 0 0 612 792 ]
>>
endobj
1 0 obj
<<
/Type /Catalog
/Pages 2 0 R
/OpenAction [ 5 0 R /Fit ]
/Names << % the Javascript entry
/JavaScript <<
/Names [
(EmbeddedJS)
<<
/S /JavaScript
/JS (
app.alert('Hello, World!');
)
>>
]
>>
>> % end of the javascript entry
>>
endobj
2 0 obj
<<
/Type /Pages
/Count 1
/Kids [ 5 0 R ]
>>
endobj
3 0 obj
<<
>>
endobj
xref
0 6
0000000000 65535 f
0000000166 00000 n
0000000244 00000 n
0000000305 00000 n
0000000009 00000 n
0000000058 00000 n
trailer <<
/Size 6
/Root 1 0 R
>>
startxref
327
%%EOF
It correctly identifies the number of pages and etc, just doesn't pick up the JS around line 25
Additional question: do you guys support stripping this kind of stuff from the file? The example does a great job of explaining how to find JS/Flash/Video in a PDF, but not how to remove (if possible)
Thanks!
Thanks. The reason is that the Inspect method is currently not recursive, only looks at the objects at the first-level (looping through the indirect objects).
To make this work. We need to refactor the parser.inspect method to walk through the PDF object structure recursively.