pdf2xml icon indicating copy to clipboard operation
pdf2xml copied to clipboard

pdf2xml convertor based on Xpdf library - modified version

Results 11 pdf2xml issues
Sort by recently updated
recently updated
newest added

$ gdb ./pdf2xml (gdb) r 05-Stack-buffer-overflow-XRef-getObjectStream.pdf test.xml [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Syntax Error (593163): Dictionary key must be a name object Syntax Error (593170):...

$ ./pdf2xml 04-Memory leaks-TextPage-testLinkedText.pdf test.xml ================================================================= ==82085==ERROR: LeakSanitizer: detected memory leaks Direct leak of 99000 byte(s) in 99 object(s) allocated from: #0 0x7f1554619602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602) #1 0x7f1553d2f735 (/usr/lib/x86_64-linux-gnu/libxml2.so.2+0x2e735) Direct...

$ gdb ./pdf2xml (gdb) r 03-Unknow-pointer-dereference-TextPage-restoreState.pdf test.xml Program received signal SIGSEGV, Segmentation fault. 0x000000000041f38e in TextPage::restoreState (this=0x61500000b980, state=0x61700000f900) at /home/test/pdf2xml_analysis/pdf2xml/src/XmlOutputDev.cc:2765 2765 idCur = idStack.top(); (gdb) x/5i $rip => 0x41f38e :...

$ ./pdf2xml 02-Heap-buffer-overflow-addAttributsNode.pdf test.xml ================================================================= ==57105==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60200004999a at pc 0x7f0869e219f5 bp 0x7fffaa6b3610 sp 0x7fffaa6b2da0 WRITE of size 12 at 0x60200004999a thread T0 #0 0x7f0869e219f4 in __interceptor_vsprintf...

$ ./pdf2xml 01-Heap-buffer-overflow-TextPage-dump.pdf test.xml ================================================================= ==36659==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60200004405a at pc 0x7fb7e473d9f5 bp 0x7ffc8a9d8c60 sp 0x7ffc8a9d83f0 WRITE of size 11 at 0x60200004405a thread T0 #0 0x7fb7e473d9f4 in __interceptor_vsprintf...

$ gdb ./pdf2xml (gdb) r 01-NULL-pointer-dereference-TextPage-restoreState test.xml [00-NULL-pointer-dereference-TextPage-restoreState.pdf](https://github.com/kermitt2/pdf2xml/files/4876379/00-NULL-pointer-dereference-TextPage-restoreState.pdf) Program received signal SIGSEGV, Segmentation fault. 0x000000000040e29b in TextPage::restoreState (state=0x7a02a0, this=0x7a2100) at /home/test/pdf2xml/src/XmlOutputDev.cc:2765 2765 idCur = idStack.top(); (gdb) bt #0 0x000000000040e29b in...

https://link.springer.com/chapter/10.1007%2F978-3-642-21560-5_33

It would be good to be able to process PDFs without actually having to have a file on the disk. e.g. the following should work: ``` cat sample.pdf | pdftoxml...