pdf2xml
pdf2xml copied to clipboard
pdf2xml from http://sourceforge.net/projects/pdf2xml/
pdftoxml
version 1.0.0 July 2007
The Xpdf software and documentation are copyright 1996-2007 Glyph & Cog, LLC.
Email: [email protected] WWW: http://www.foolabs.com/xpdf/
The PDF data structures, operators, and specification are copyright 1985-2006 Adobe Systems Inc.
The libxml2 software and documentation are released under the MIT License. See the Copyright file in the distribution for the precise wording.
What is pdftoxml?
pdftoxml is an open source PDF to XML convertor. pdftoxml runs under Linux and on Win32 systems.
pdftoxml is based on xpdf and is essentially a (large) modification of pdftotext in order to generate XML instead of plain text. The XML generation uses the libxml2 library
Distribution
pdftoxml is licensed under the GNU General Public License (GPL), version 2.
Compatibility
pdftoxml is developed and tested on a Linux 2.4 x86 system. In addition, it has been compiled on a Win32 system.
Getting pdftoxml
The latest version is available from: https://sourceforge.net/projects/pdf2xml Source code is available from: http://pdf2xml.cvs.sourceforge.net/pdf2xml/
Running pdftoxml
To run pdftoxml, simply type:
pdftoxml.exe file.pdf
Command line options and many other details are(should be) described in sourceforge
Compiling pdftoxml
See the separate file, INSTALL.
Contributors
Hervé Déjean (src) Sophie Andrieu (src) Jean-Yves Vion-Dury (schemas)