pdftohtmlr icon indicating copy to clipboard operation
pdftohtmlr copied to clipboard

Ruby wrapper around the pdftohtml command line utility (around xpdf)

h1. pdftohtmlr

Wrapper around the command line tool pdftohtml which converts PDF to HTML, go figure.

This gem was inspired by the MiniMagick gem - which does the same thing for ImageMagick (thanks Corey).

h1. Requirements

Just pdftohtml and Ruby (1.8.6+ as far as I know).

On Mac:

brew install pdftohtml

On Ubuntu: It should be installed by default with the 'poppler-utils' package.

h1. Install

"http://gemcutter.org/gems/pdftohtmlr":http://gemcutter.org/gems/pdftohtmlr

gem install pdftohtmlr

h1. Using

"gist examples":http://gist.github.com/254556

require 'pdftohtmlr'
require 'nokogiri'
include PDFToHTMLR
file = PdfFilePath.new([Path to Source PDF])
string = file.convert
doc = file.convert_to_document()

See included test cases for more usage examples, including passwords and URL fetching.

h1. license

MIT (See included MIT-LICENSE)