OpenPDF icon indicating copy to clipboard operation
OpenPDF copied to clipboard

Convert pdf to images

Open sabraMa opened this issue 4 years ago • 9 comments

Hello I have some questions: 1 Could I convert my pdf document( multipages) to png images with OpenPDF ? 2 Is it possible to generate pdfA format ? 3 there is any way to compress exsiting pdf ?

thank you a lot for your help

sabraMa avatar Mar 18 '20 13:03 sabraMa

This is my method of rewriting. I hope it will be useful!

     /**
	 * 根据总页数,按照perSize页生成一张长图片的逻辑, 进行拆分
	 * @param in
	 * @param perSize
	 * @return
	 */
	public static List<byte[]> pdfToImage(byte[] in, Integer perSize) {
		List<byte[]> bms = new ArrayList<>();
		try {
			/*图像合并使用参数*/
			// 定义宽度
			int width = 0;
			// 保存一张图片中的RGB数据
			int[] singleImgRGB;
			// 定义高度,后面用于叠加
			int shiftHeight = 0;
			//保存每张图片的像素值
			BufferedImage imageResult = null;
			// 利用PdfBox生成图像
			PDDocument pdDocument = PDDocument.load(in);
			PDFRenderer renderer = new PDFRenderer(pdDocument);
			/*根据总页数, 按照50页生成一张长图片的逻辑, 进行拆分*/
			// 每50页转成1张图片
			perSize = perSize == null? 1: perSize;
			// 总计循环的次数
			int totalCount = getPages(pdDocument.getNumberOfPages(), perSize);
			for (int m = 0; m < totalCount; m++) {
				for (int i = 0; i < perSize; i++) {
					int pageIndex = i + (m * perSize);
					if (pageIndex == pdDocument.getNumberOfPages()) {
						break;
					}
					// 144为图片的dpi,dpi越大,则图片越清晰,图片越大,转换耗费的时间也越多
					BufferedImage image = renderer.renderImageWithDPI(pageIndex, 144, ImageType.RGB);
					int imageHeight = image.getHeight();
					int imageWidth = image.getWidth();
					if (i == 0) {
						//计算高度和偏移量
						//使用第一张图片宽度;
						width = imageWidth;
						// 保存每页图片的像素值
						// 加个判断:如果m次循环后所剩的图片总数小于pageLength,则图片高度按剩余的张数绘制,否则会出现长图片下面全是黑色的情况
						if ((pdDocument.getNumberOfPages() - m * perSize) < perSize) {
							imageResult = new BufferedImage(width, imageHeight * (pdDocument.getNumberOfPages() - m * perSize), BufferedImage.TYPE_INT_RGB);
						} else {
							imageResult = new BufferedImage(width, imageHeight * perSize, BufferedImage.TYPE_INT_RGB);
						}
					} else {
						// 将高度不断累加
						shiftHeight += imageHeight;
					}
					singleImgRGB = image.getRGB(0, 0, width, imageHeight, null, 0, width);
					imageResult.setRGB(0, shiftHeight, width, imageHeight, singleImgRGB, 0, width);
				}
				// image转byte[]
				ByteArrayOutputStream byteArrayOutputStream = new  ByteArrayOutputStream();
				ImageIO.write(imageResult, "png", byteArrayOutputStream);
				byteArrayOutputStream.flush();
				bms.add(byteArrayOutputStream.toByteArray());
				byteArrayOutputStream.close();
				// 写图片
				//File outFile = new File(pdfPath.replace(".pdf", "_" + m + ".jpg"));
				//ImageIO.write(imageResult, "jpg", outFile);
				shiftHeight = 0;
			}
			pdDocument.close();
		} catch (Exception e) {
			log.error("pdf转图片异常", e);
		}
		return bms;
	}

        /*
	 * 计算总页数
	 */
	private static int getPages(int counts, int pageSize) {
		if(counts == 0) {
			return 0;
		} else if (counts <= pageSize) {
			return 1;
		} else if (counts%pageSize!=0) {
			return counts / pageSize + 1;
		} else {
			return counts / pageSize;
		}
	}

doobo avatar Jun 30 '20 11:06 doobo

Thanks for sharing! Can you please submit this code as a pull request to OpenPDF? Create a new Java class for it. Then we can add this as a new useful high-level function in the library.

andreasrosdal avatar Jun 30 '20 20:06 andreasrosdal

This is a nice one for anybody wanting to contribute.

  • Grab the code of https://github.com/LibrePDF/OpenPDF/issues/346#issuecomment-651729163
  • Create some new utility class for that method
  • clean the code a little bit (my chinese is not that good :-) )
  • write some Unit-Test
  • Create a Pull Request

asturio avatar Feb 04 '21 18:02 asturio

This code uses Apache PDFBox not LibrePDF

GreenToad avatar Mar 02 '21 17:03 GreenToad

Just answering 2 questions:

  1. Yes, OpenPDF can generate PDF/A
  2. I'm not aware of any part of OpenPDF for compressing existing PDFs. There are some other nice (non-Java) tools which can manipulate Postscript and PDF. Maybe Ghostscript is a away to do so.

asturio avatar Mar 20 '21 12:03 asturio

@asturio The labels good first issue and task could probably be removed from this issue. The code is for Apache PDFBox and I don't see an easy way to just add that functionality. This issue does not need continuous attention as #145 and #152 which have the task label.

mluppi avatar Jul 17 '21 18:07 mluppi

Hi @zengleo did you make changes to add this functionality?

bhupendersinghh avatar Jan 25 '22 08:01 bhupendersinghh

Hi, is there anyone working on this issue? Can I take it?

ObsisMc avatar Mar 08 '22 12:03 ObsisMc

Sure, please submit a pull request for this. @ObsisMc

andreasrosdal avatar Jun 15 '22 22:06 andreasrosdal