pdf2htmlEX icon indicating copy to clipboard operation
pdf2htmlEX copied to clipboard

Running on Amazon Lambda

Open JayVem opened this issue 8 years ago • 9 comments

Is it possible to run pdf2htmlEx on Amazon Lambda? Amazon uses its own Amazon Linux on compute instances that use Lambda. I believe this is a great use case for distributed processing - esp, for large pdf documents that take upto 15 minutes on a regular i7 desktop processor, could be done with in a minute on Lambda.

JayVem avatar Oct 30 '16 11:10 JayVem

Unlikely ever to happen. Lambda supported only Node.js, Java and Python. pdf2htmlEX is C++ and relies on a whole bunch of supporting libraries.

davidhedley avatar Nov 15 '16 08:11 davidhedley

Yes, I got it working by compiling it on amazon linux and packing the executable with the lambda.

JayVem avatar Nov 15 '16 17:11 JayVem

the function itself is very simple - it just invokes pdf2htmlex as an external system process using nodejs. The difficult part was compiling pdf2htmlEX on Amazon Linux.

JayVem avatar Nov 15 '16 22:11 JayVem

Emscripten?

fasiha avatar Nov 30 '16 16:11 fasiha

Any progress or alternative solution?

zeckli avatar Jan 11 '17 07:01 zeckli

@JayVem Can you point me in the direction of instructions or provide instructions on how you were able to compile pdf2htmlex and the dependencies for Lambda? That would be greatly appreciated as we are attempting do so without success.

careerlister avatar Jan 20 '17 18:01 careerlister

@JayVem I'd also be interested -- I got to a point where I managed to compile pdf2htmlEX on Amazon Linux, but not a static binary, and I think static binary will be required here. Are you able to share some insights/the binary itself, or is it proprietary?

ardcore avatar Aug 31 '17 01:08 ardcore

@careerlister @ardcore I've managed to get it working on lambda (finally). Approach I eventually took was:

  1. Building all dependent libraries and pdf2htmlEX from source on a lambci/lambda:build-nodejs6.10 docker build image locally to deal with differences between the lambda linux environment and the host OS (in my case OSX). Initially tried on a remote Amazon Linux EC2 instance but the output didn't work on Lambda.
  2. After this move the required libraries & binaries to a lambda function on your host machine.
  3. Run on a image that replicates your desired lambda environment, my test script basically was: docker run -v \"$PWD\":/var/task lambci/lambda:nodejs6.10 index.handler '{"some":"event"}' using a child process inside node.js to call pdf2htmlEX from node.
  4. Adjust/repeat steps 1, 2 & 3 until the function works on the local docker image.
  5. Deploy to lambda ensuring your total deployment size unzipped is less then 250mb see here for an explanation. Due to using the docker image locally to test, I didn't have any issues migrating over with different operating environments.

dengelke avatar Nov 03 '17 00:11 dengelke

@JayVem Can you tell me how did you installed pdf2htmlex on amazon linux? I can't install it. I have ubuntu 18 distro

gutitrombotto avatar Nov 05 '18 20:11 gutitrombotto