nextjs-pdf-parser
nextjs-pdf-parser copied to clipboard
Next.js template for seamless PDF parsing using pdf2json and FilePond. Ideal for developers seeking a ready-to-use solution for PDF content extraction in Next.js projects.
Next.js PDF Parser Template 📄🔍
https://github.com/tuffstuff9/nextjs-pdf-parser/assets/57072903/c9e5e5eb-ceeb-4947-b26c-11f87bb26312
Introduction
I was having some trouble parsing PDFs in Next.js, so I thought I would make this template for anyone else who was facing the same issues as me. I hope this template saves you some time and trouble. It's a basic create-next-app with PDF parsing implemented using the pdf2json library and file uploading facilitated by FilePond.
Installation & Setup 🚀
-
Clone the repository:
-
git clone [repository-url] -
Navigate to the project directory:
-
cd nextjs-pdf-parser -
Install dependencies:
-
Windows only: In
app\api\upload\route.tson line 22, changetempFilePathto a valid path. Make sure it starts from the root drive, for example:C:/coding/nextjs-pdf-parser/public/${fileName}.pdf -
npm install # or yarn install -
Run the development server:
npm run dev # or yarn devVisit
http://localhost:3000to view the application.
Usage 🖱
Navigate to http://localhost:3000 and use the FilePond uploader to select and upload a PDF. Once uploaded, the content of the PDF is parsed and printed to the server console (Note: it will not be printed to the browser log).
Technical Details 🛠
-
nodeUtil is not defined Error:
To bypass the
nodeUtil is not definederror, the following configuration was added tonext.config.js:
const nextConfig = {
experimental: {
serverComponentsExternalPackages: ['pdf2json'],
},
};
module.exports = nextConfig;
See more details here
-
Blank output from
pdfParser.getRawTextContent():This issue might be due to incorrect type definitions. There are two potential solutions:
-
Fix TypeScript definitions: Update the type definition for PDFParser.
-
Bypass type checking: Instantiate PDFParser as shown:
const pdfParser = new (PDFParser as any)(null, 1);
For more details, refer to my comment on this GitHub issue.
-
Acknowledgements 🙏
A special thanks to the following libraries and their contributors:
- FilePond: For providing a seamless and user-friendly file uploading experience.
- pdf2json: For its efficient and robust PDF parsing capabilities.
License 📜
MIT License