pdf2htmlEX issues

convert all PDF content into one web page

4

Is it possible that pdf2htmlEX converts all PDF content into one web page, and does not keep the page content mapping w.r.t. the original PDF? Basically, just extract all PDF...

pwang7

TOC and many internal crossref links?

My PDF (produced in LaTex) has a TOC and many internal crossref links. How do I import/enable these in the converted html? Sorry to post the above. I found much...

sodiumchl

Why is some of the text not extracted and is basked into the generated images?

Some of my text is not extracted properly, I can select it in multiple pdf viewers so it's definitely text and not part of the image but in the generated...

isaacfink

Why are the matrix styles needed?

I am using this library to convert pdfs to svgs, I need to parse out the font and positioning information and the information I am getting is all wrong because...

isaacfink

Update Poppler to 24.06.1

Update Poppler to 24.06.1 Hello, Been a while since our last Poppler update. I've also updated the CI workflow to use Ubuntu 22.04 , because newer Poppler requires a newer...

ViliusSutkus89

Update package versions to build on ubuntu 22.04

1

This PR changes the java package requested from `openjdk-8-jre-headless` to `openjdk-17-jre-headless` to get pdf2htmlEX to build on ubuntu 22.04.

telmop

How to build it with Node.js base image?

1

Can anyone share your dockerfile? Mine is like this: ``` ARG FUNCTION_DIR="/function" FROM node:20-buster as build-image # Include global arg in this stage of the build ARG FUNCTION_DIR COPY ./pdf2htmlEX.deb...

bytrangle

Run pdf2htmlEX with Node.js, get stderr

When I used Node.js to execute `pdf2htmlEX`, it runs successfully but somehow the message from pdf2htmlEX is in `stdout`, not `stderr`. Example of a simple Node.js script that runs `pdf2htmlEX`:...

bytrangle

Where is /bin/sh script in the tar archive?

It is said in this wiki page that the tar archive has a `/bin/sh` script, but I can't see where is it.

bytrangle

pdf2htmlEX
pdf2htmlEX copied to clipboard

Metadata

convert all PDF content into one web page

Issue in selecting text

TOC and many internal crossref links?

Why is some of the text not extracted and is basked into the generated images?

Why are the matrix styles needed?

Update Poppler to 24.06.1

Update package versions to build on ubuntu 22.04

How to build it with Node.js base image?

Run pdf2htmlEX with Node.js, get stderr

Where is /bin/sh script in the tar archive?

← Metadata

Owner

Metadata

pdf2htmlEX pdf2htmlEX copied to clipboard

Metadata

← Metadata

Owner

Metadata

pdf2htmlEX
pdf2htmlEX copied to clipboard