pdf2xml icon indicating copy to clipboard operation
pdf2xml copied to clipboard

pdf2xml from http://sourceforge.net/projects/pdf2xml/

pdftoxml

version 1.0.0 July 2007

The Xpdf software and documentation are copyright 1996-2007 Glyph & Cog, LLC.

Email: [email protected] WWW: http://www.foolabs.com/xpdf/

The PDF data structures, operators, and specification are copyright 1985-2006 Adobe Systems Inc.

The libxml2 software and documentation are released under the MIT License. See the Copyright file in the distribution for the precise wording.

What is pdftoxml?

pdftoxml is an open source PDF to XML convertor. pdftoxml runs under Linux and on Win32 systems.

pdftoxml is based on xpdf and is essentially a (large) modification of pdftotext in order to generate XML instead of plain text. The XML generation uses the libxml2 library

Distribution

pdftoxml is licensed under the GNU General Public License (GPL), version 2.

Compatibility

pdftoxml is developed and tested on a Linux 2.4 x86 system. In addition, it has been compiled on a Win32 system.

Getting pdftoxml

The latest version is available from: https://sourceforge.net/projects/pdf2xml Source code is available from: http://pdf2xml.cvs.sourceforge.net/pdf2xml/

Running pdftoxml

To run pdftoxml, simply type:

pdftoxml.exe file.pdf

Command line options and many other details are(should be) described in sourceforge

Compiling pdftoxml

See the separate file, INSTALL.

Contributors

Hervé Déjean (src) Sophie Andrieu (src) Jean-Yves Vion-Dury (schemas)