OpenPDF
OpenPDF copied to clipboard
Convert pdf to images
Hello I have some questions: 1 Could I convert my pdf document( multipages) to png images with OpenPDF ? 2 Is it possible to generate pdfA format ? 3 there is any way to compress exsiting pdf ?
thank you a lot for your help
This is my method of rewriting. I hope it will be useful!
/**
* 根据总页数,按照perSize页生成一张长图片的逻辑, 进行拆分
* @param in
* @param perSize
* @return
*/
public static List<byte[]> pdfToImage(byte[] in, Integer perSize) {
List<byte[]> bms = new ArrayList<>();
try {
/*图像合并使用参数*/
// 定义宽度
int width = 0;
// 保存一张图片中的RGB数据
int[] singleImgRGB;
// 定义高度,后面用于叠加
int shiftHeight = 0;
//保存每张图片的像素值
BufferedImage imageResult = null;
// 利用PdfBox生成图像
PDDocument pdDocument = PDDocument.load(in);
PDFRenderer renderer = new PDFRenderer(pdDocument);
/*根据总页数, 按照50页生成一张长图片的逻辑, 进行拆分*/
// 每50页转成1张图片
perSize = perSize == null? 1: perSize;
// 总计循环的次数
int totalCount = getPages(pdDocument.getNumberOfPages(), perSize);
for (int m = 0; m < totalCount; m++) {
for (int i = 0; i < perSize; i++) {
int pageIndex = i + (m * perSize);
if (pageIndex == pdDocument.getNumberOfPages()) {
break;
}
// 144为图片的dpi,dpi越大,则图片越清晰,图片越大,转换耗费的时间也越多
BufferedImage image = renderer.renderImageWithDPI(pageIndex, 144, ImageType.RGB);
int imageHeight = image.getHeight();
int imageWidth = image.getWidth();
if (i == 0) {
//计算高度和偏移量
//使用第一张图片宽度;
width = imageWidth;
// 保存每页图片的像素值
// 加个判断:如果m次循环后所剩的图片总数小于pageLength,则图片高度按剩余的张数绘制,否则会出现长图片下面全是黑色的情况
if ((pdDocument.getNumberOfPages() - m * perSize) < perSize) {
imageResult = new BufferedImage(width, imageHeight * (pdDocument.getNumberOfPages() - m * perSize), BufferedImage.TYPE_INT_RGB);
} else {
imageResult = new BufferedImage(width, imageHeight * perSize, BufferedImage.TYPE_INT_RGB);
}
} else {
// 将高度不断累加
shiftHeight += imageHeight;
}
singleImgRGB = image.getRGB(0, 0, width, imageHeight, null, 0, width);
imageResult.setRGB(0, shiftHeight, width, imageHeight, singleImgRGB, 0, width);
}
// image转byte[]
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
ImageIO.write(imageResult, "png", byteArrayOutputStream);
byteArrayOutputStream.flush();
bms.add(byteArrayOutputStream.toByteArray());
byteArrayOutputStream.close();
// 写图片
//File outFile = new File(pdfPath.replace(".pdf", "_" + m + ".jpg"));
//ImageIO.write(imageResult, "jpg", outFile);
shiftHeight = 0;
}
pdDocument.close();
} catch (Exception e) {
log.error("pdf转图片异常", e);
}
return bms;
}
/*
* 计算总页数
*/
private static int getPages(int counts, int pageSize) {
if(counts == 0) {
return 0;
} else if (counts <= pageSize) {
return 1;
} else if (counts%pageSize!=0) {
return counts / pageSize + 1;
} else {
return counts / pageSize;
}
}
Thanks for sharing! Can you please submit this code as a pull request to OpenPDF? Create a new Java class for it. Then we can add this as a new useful high-level function in the library.
This is a nice one for anybody wanting to contribute.
- Grab the code of https://github.com/LibrePDF/OpenPDF/issues/346#issuecomment-651729163
- Create some new utility class for that method
- clean the code a little bit (my chinese is not that good :-) )
- write some Unit-Test
- Create a Pull Request
This code uses Apache PDFBox not LibrePDF
Just answering 2 questions:
- Yes, OpenPDF can generate PDF/A
- I'm not aware of any part of OpenPDF for compressing existing PDFs. There are some other nice (non-Java) tools which can manipulate Postscript and PDF. Maybe Ghostscript is a away to do so.
@asturio The labels good first issue and task could probably be removed from this issue. The code is for Apache PDFBox and I don't see an easy way to just add that functionality. This issue does not need continuous attention as #145 and #152 which have the task label.
Hi @zengleo did you make changes to add this functionality?
Hi, is there anyone working on this issue? Can I take it?
Sure, please submit a pull request for this. @ObsisMc