SkiaSharp icon indicating copy to clipboard operation
SkiaSharp copied to clipboard

PDF output size

Open neodynamic opened this issue 5 years ago • 30 comments

Running this sample https://github.com/mono/SkiaSharp/blob/master/samples/Gallery/Shared/Samples/CreatePdfSample.cs which creates a simple two pages PDF file, the created file (under Windows) is about 510KB

Is there any compression setting to get the output PDF file size lighter? 510KB for a two pages PDF with a simple text seems to be somehow heavy... Any hints?

neodynamic avatar May 22 '19 20:05 neodynamic

@neodynamic Can you see what Skia produces?

charlesroddie avatar Jun 10 '19 11:06 charlesroddie

@charlesroddie no, and you?

neodynamic avatar Jun 10 '19 11:06 neodynamic

You could have a look at setting the quality down: https://docs.microsoft.com/en-us/dotnet/api/skiasharp.skdocumentpdfmetadata

mattleibow avatar Jun 24 '19 02:06 mattleibow

We've been reviewing this matter and we can conclude that the concerns about pdf output file size cannot be improved because the following... it seems that Skia (native lib) PDF backend design will embed any font file needed to render the text at the target device. That page states the following:

We can't assume that an arbitrary font will be available at PDF view time, so we embed all fonts in accordance with modern PDF guidelines.

The sample here https://github.com/mono/SkiaSharp/blob/master/samples/Gallery/Shared/Samples/CreatePdfSample.cs will use the default font in the system, which under Windows, it's likely to be Segoe UI which TTF file size is about 900KB The output PDF file for that simple test where a single text is drawn is about 510KB. That big size for such a simple pdf is because Segoe UI font file is embedded in the file by Skia design. We've made another tests by drawing Chinese text using the Yu Gothic font which file size is about 13MB! and the output pdf file is about 8MB! which confirms that the size is because the font files being embedded. Linked fonts seems not to be supported which could make the pdf output file size smaller.

If no one here has more comments on this matter, then @mattleibow you can close this issue.

neodynamic avatar Jul 10 '20 14:07 neodynamic

You could use HarfBuzz's subsetting to reduce the font's size. Then load that font to produce the PDF. Sadly this isn't supported by HarfBuzzSharp. Yet....

Gillibald avatar Jul 10 '20 15:07 Gillibald

Yes, that could be the only way to reduce pdf output file size...

neodynamic avatar Jul 10 '20 15:07 neodynamic

Looking at the skia code, it seems there is 2 subsetters built in. But, this is disabled because we are not building with either icu or harfbuzz/sfntly.

However, there is a hook that makes subsetting work, but it is not a "public API". But, since it is fairly simple, we might be able to do something. The API hasn't changed much, so it might just be safe to do something.

I'll have a look at what we can do. Can't promise anything as I haven't had a look at exactly how the PDF is constructed, but it seems to only write the fonts when the PDF is closed, so we could potentially add a argument there, or in the metadata in the constructor. They actually have an enum there that allows you to pick either harfbuzz or sfntly. Seems to be not too hard to add one for us, and then we can use any font subsetter.

mattleibow avatar Jul 10 '20 19:07 mattleibow

Started a thing on the skia bugtracker. I want to do this right: https://bugs.chromium.org/p/skia/issues/detail?id=10491 And discussion: https://groups.google.com/g/skia-discuss/c/XIvDEEwZrAM

mattleibow avatar Jul 10 '20 22:07 mattleibow

You could have a look at setting the quality down: https://docs.microsoft.com/en-us/dotnet/api/skiasharp.skdocumentpdfmetadata

Hi @mattleibow. I've tried to set lower EncodingQuality and RasterDpi and they have no impact on the output file at all. It outputs the same file size and quality. Latest SkiaSharp on Windows 10.

Alexbits avatar Aug 26 '20 07:08 Alexbits

Any progress on this? Japanese/Chinese fonts are easily 10MB+ (per weight), so this becomes nigh unusable.

reinux avatar Aug 23 '21 04:08 reinux

Cant believe this is an issue. You should let the developer choose whether to embed the font file or not.

johmarjac avatar Oct 19 '21 21:10 johmarjac

@mattleibow I was able to build the Windows libSkiaSharp using Skia's support for Harfbuzz subsetting. It seems to work fine. My test PDF that was over 280 KB went down to less than 10 KB with the subsetting. Other than changing the Skia build switches, the only thing I had to do was edit Skia's Harfbuzz BUILD.gn since the forked version appears to be out of sync with the Harfbuzz commit in the DEPS.

Can you think of any reason that this wouldn't be a viable solution?

jeffska avatar Oct 27 '21 18:10 jeffska

any updates regarding the file size/fonts?

wstaelens avatar Jan 04 '22 13:01 wstaelens

Should I assume this has been abandoned and rebuild my project using another PDF library?

reinux avatar Jan 14 '22 07:01 reinux

Should I assume this has been abandoned and rebuild my project using another PDF library?

In case you go for a different library, don't use QuestPDF as it uses SkiaSharp under the hood and suffers from the same big file sizes.

johmarjac avatar Jan 14 '22 07:01 johmarjac

Thanks for the tip. I don't understand how something like this wouldn't be recognized as a fatal issue.

If no one here has more comments on this matter, then @mattleibow you can close this issue.

Like, what.

reinux avatar Jan 14 '22 08:01 reinux

Thanks for the tip. I don't understand how something like this wouldn't be recognized as a fatal issue.

If no one here has more comments on this matter, then @mattleibow you can close this issue.

Like, what.

Depends on the use case really. If you only generate a single pdf, no one cares for a 2 MB PDF on their PC. But I needed it for a production series for part protocol where I have a part every 2 seconds so every 2 seconds I need to save a pdf to network share to archive the part measure results. Then a 2 MB file every 2 seconds costs a hell lot of storage and that's just not going to work

johmarjac avatar Jan 14 '22 08:01 johmarjac

indeed, thousands of small files are processed. 1000 * 2MB (while it is normally like 159KB - 236KB really means a big difference, in network traffic, processing time, diskspace etc...)

it is related to fonts, but should be investigated by skia...

another reason: Most ISP mailboxes/corporate policies still have a mailbox email size limit of 10MB or 15MB. meaning 5 attachments vs. 15 - 20....

wstaelens avatar Jan 14 '22 09:01 wstaelens

Even though I got the native Skia subsetting working with a custom build, I wasn't happy with it. It's a very naive implementation,, and doesn't perform well for larger (like CJK) fonts. It's better than nothing, but wasn't sufficient for my use.

I ended up using a two-pass approach by building the font subsets before rendering and then passing those in to SkiaSharp.

jeffska avatar Jan 14 '22 13:01 jeffska

Is there a way to force Skia to render all text as paths? Is there a way to make a path from a particular text?

KillyMXI avatar Jan 14 '22 18:01 KillyMXI

I'm trying to run pdf checker on a file generated with SkiaSharp, with no strings in it. The only non-empty notice in the report:

Cleanup Results
    Errors:
        None
    Information:
        Contains conservatively compressed streams:
            Uncompressed: (141 instances)
    Checks Completed:
        suboptimal-compression

Optimizing it with 3rd party tools allowed to go from ~500kb to 100kb.

Looks like there is something besides embedded fonts that could be optimized in Skia.

My sample is generated with Svg.Skia and the source only consists of vector lines. I've no idea what can be so inefficient there.

Trying to mess with SKDocumentPdfMetadata actually results in bigger file size. I would expect it to be a no-op, but if I supply any RasterDpi value or non-default EncodingQuality value, the file size jumps up another ~400kb. This doesn't make sense.

KillyMXI avatar Jan 17 '22 18:01 KillyMXI

Any updates for this? @mattleibow

domagojmedo avatar May 25 '22 11:05 domagojmedo

I ended up using a two-pass approach by building the font subsets before rendering and then passing those in to SkiaSharp.

@jeffska : would you have a gist or some place where we could take a look at what you put in place to build the font subsets externally ?

Greybird avatar Jul 18 '22 11:07 Greybird

So, How is the progress?

TimLee88 avatar Apr 04 '23 07:04 TimLee88

Wondering the same @TimLee88

wstaelens avatar Apr 05 '23 07:04 wstaelens

images in PDFs don't seem to support 1bpp which increases also the pdf size, correct?

wstaelens avatar Nov 08 '23 12:11 wstaelens

Hi, has anything happened here? We are looking for a solution. We had been using https://github.com/Sicos1977/ChromeHtmlToPdf to convert from SVG to PDF and moved to SkiaSharp to get rid of the Google chrome processes.

But now the PDF files which had been between 20 and 40 KB are now over 500 KB big. Since we convert a lot of files in production, and need to send these PDF files over ethernet to terminals, we would like the file sizes to be lower again. All used fonts are available on the terminals and there is no need to embed them in the PDF file.

So, is there someone workling on this issue or will this not be implemented at all?

giz303 avatar Feb 02 '24 11:02 giz303

Hi, has anything happened here? We are looking for a solution. We had been using https://github.com/Sicos1977/ChromeHtmlToPdf to convert from SVG to PDF and moved to SkiaSharp to get rid of the Google chrome processes.

But now the PDF files which had been between 20 and 40 KB are now over 500 KB big. Since we convert a lot of files in production, and need to send these PDF files over ethernet to terminals, we would like the file sizes to be lower again. All used fonts are available on the terminals and there is no need to embed them in the PDF file.

Have you inspected the PDF to make sure the SVG isn't just being rasterized?

jeffska avatar Feb 02 '24 18:02 jeffska

we've seen an increase because 1bpp images are not supported. Resulting in larger pdf's (every 1bpp image is converted to 24bpp). The 1bpp pdf's happen when multifunctional devices make scans..

Would like to see some support also for 1bpp... as this makes pdf's much much bigger. (especially when 1bpp glyph bitmaps are used)

wstaelens avatar Feb 05 '24 08:02 wstaelens

I don't care so much about performance or lib size and I already use HarfBuzzSharp for measuring text widths. Is there any way to (optionally) enable font subsetting with HarfBuzz?

flensrocker avatar Apr 02 '24 14:04 flensrocker