paper2html
paper2html copied to clipboard
Converts a single/double-column PDF formatted paper into a html page, which has the original view & the paragraph view extracted from the paper for translation from the browser.
paper2html
Convert a PDF paper to html page.
You can translate the paper easily by browser functions, and you can view the original document and the translated document at the same time.

Albanie, Samuel, Sébastien Ehrhardt, and Joao F. Henriques. "Stopping gan violence: Generative unadversarial networks." arXiv preprint arXiv:1703.02528 (2017).
If you want to convert papers more accurately, you can also use a good experimental service by Allen Institute for AI.
Features
- Convert PDF files on the Internet easily by using a bookmarklet.
- Support for double-column papers.
Installing and running paper2html server
Docker
$ docker run --rm -it -p 6003:6003 ghcr.io/ktaaaki/paper2html
Use with care as it opens up the port.
Debian GNU/Linux, Ubuntu
$ sudo apt install poppler-utils poppler-data
$ git clone https://github.com/ktaaaki/paper2html.git
$ pip install -e paper2html
$ python3 ./paper2html/main.py
macOS
$ brew install poppler
$ git clone https://github.com/ktaaaki/paper2html.git
$ pip install -e paper2html
$ python3 ./paper2html/main.py
Windows
Download Poppler for Windows binary file from http://blog.alivate.com.au/poppler-windows/
Please set the Poppler for Windows path(ex.C:\Users\YOUR_NAME\Downloads\poppler-0.68.0\bin) in the PATH environment variable.
Verify that the path is displayed with the following command.
> where.exe pdfinfo
Download the zip file or use git clone command to save the paper2html code locally, and then install it using the following command.
> py -m pip -e paper2html
> python .\paper2html\main.py
Usage
Conversion PDF on the web to html with paper2html server
Upload a PDF file to the server by using this bookmarklet.
javascript:var esc=encodeURIComponent;var d=document;var subw=window.open('http://localhost:6003/paper2html/convert?url='+esc(location.href)).document;
Click on the bookmarklet when you open a PDF paper in your browser.
Then the conversion will start and the generated html will be opened after a while.
You can see the list of converted documents in the index page localhost:6003/paper2html/index.html
NOTE👉 If you are running a paper2html server on Docker, you will not be able to convert PDF file on the host OS with the bookmarklet. See docker image doc.
Conversion local PDF to html with CLI
Run this command, then open the html file in your browser.
$ python paper2html/commands.py "path-to-paper-file.pdf"
In IPython, do it like this.
>>> import paper2html
>>> paper2html.open_paper_htmls("path-to-paper-file.pdf")
You can use specific browser.
$ python paper2html/commands.py "path-to-paper-file.pdf" --browser_path="/path/to/browser"
You can also only convert without opening a browser.
>>> import paper2html
>>> paper2html.paper2html("path-to-paper-file or directory")