pdftohtmlr
pdftohtmlr copied to clipboard
Ruby wrapper around the pdftohtml command line utility (around xpdf)
h1. pdftohtmlr
Wrapper around the command line tool pdftohtml which converts PDF to HTML, go figure.
This gem was inspired by the MiniMagick gem - which does the same thing for ImageMagick (thanks Corey).
h1. Requirements
Just pdftohtml and Ruby (1.8.6+ as far as I know).
On Mac:
brew install pdftohtml
On Ubuntu: It should be installed by default with the 'poppler-utils' package.
h1. Install
"http://gemcutter.org/gems/pdftohtmlr":http://gemcutter.org/gems/pdftohtmlr
gem install pdftohtmlr
h1. Using
"gist examples":http://gist.github.com/254556
require 'pdftohtmlr'
require 'nokogiri'
include PDFToHTMLR
file = PdfFilePath.new([Path to Source PDF])
string = file.convert
doc = file.convert_to_document()
See included test cases for more usage examples, including passwords and URL fetching.
h1. license
MIT (See included MIT-LICENSE)