rmapi icon indicating copy to clipboard operation
rmapi copied to clipboard

Compressing before uploading

Open hfoffani opened this issue 6 years ago • 7 comments

Hi, Thanks for the tools! In the "Printing to reMarkable" script I have added a pdf compression step (with ghostscript). It looks like this:

for f in "$@"
do
	cp "$f" /tmp/rmk$$.pdf
	/usr/local/bin/gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 \
		-dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH \
		-sOutputFile=/tmp/ormk$$.pdf /tmp/rmk$$.pdf
	cp /tmp/ormk$$.pdf "$f"
	/usr/local/bin/rmapi put "$f"
done

The temp files are required because GS doesn't handle unicode file names transparently. Also an if statement should be added to check if GS is properly installed.

hfoffani avatar Apr 22 '18 12:04 hfoffani

Hello @hfoffani, I know it is quite an old issue but what was the purpose of this compression? To save space I guess? Do you have measures of the gain you have in terms of space (before and after compression)? When compressed, is the pdf still readable on the device with enough quality? And lastly, this could be interesting but that would be even nicer if there was some sort of golang implementations of pdf compression. Maybe you know about one? Thanks!

lobre avatar Feb 21 '19 16:02 lobre

Hi @lobre To save space, yes. During my last batch of files (21) the average reduction was 44%. I don't find any difference in the quality of the output. I guess it would be enough to just compress the embedded images, that's where the main gains are. AFAICS there are many open source pdf generators for go, but apparently none of them provides this functionality.

hfoffani avatar Feb 22 '19 12:02 hfoffani

I have just fired a quick search on GitHub for PDF optimizations in Golang and found this repo.

https://github.com/hhrutter/pdfcpu

One of the written motivation is the following.

One example is reducing the size of large PDF files for mass mailings by optimization to the bare minimum.

@juruen has used this library for PDF generation for downloading and generating PDF annotations from the tablet. I need to check if it has compression features...

https://github.com/jung-kurt/gofpdf

Don't know if that could help. But otherwise, if it is just about image compression, maybe it is doable in a custom package.

lobre avatar Feb 25 '19 09:02 lobre

I've seen the first one, but according to https://github.com/hhrutter/pdfcpu/issues/6 it's not possible yet.

@juruen has used this library for PDF generation for downloading and generating PDF annotations from the tablet. I need to check if it has compression features...

The previous issue essentially specifies the tasks needed. However, I don't think you need to cover all possible PDF image encodings and all outputs.

hfoffani avatar Feb 25 '19 12:02 hfoffani

Effectively that seems to be a massive work to support many image formats. We could try to identify the most used ones. But I guess that would still make sense to do this on the upstream pdfcpu project rather than separated here.

I have also seen this project.

https://github.com/unidoc/unidoc

In the coming v3, there seems to be image optimisation. I need to explore the branch to check if that answers our problem.

Said in the readme:

PDF compression and optimization of outputs with several options 1) combining duplicates, 2) compressed object streams, 3) image points per inch threshold, 4) image quality.

lobre avatar Feb 26 '19 08:02 lobre

I've seen UniDoc too but I thought it was not free https://unidoc.io/pricing/ Apparently the first paragraph does say "when not releasing your source code under AGPLv3" So could it work then?

hfoffani avatar Feb 26 '19 09:02 hfoffani

I am not super experimented with licenses but they mention the following.

This library (UniDoc) has a dual license, a commercial one suitable for closed source projects and an AGPL license that can be used in open source software.

Here is the AGPL one.

So I would say it could work...

lobre avatar Feb 26 '19 10:02 lobre