fsharp icon indicating copy to clipboard operation
fsharp copied to clipboard

WIP implement compression for sig and optdata

Open KevinRansom opened this issue 1 year ago • 1 comments

The F# compiler adds compile time metadata and optimization data to built assemblies.

This option adds two new compiler switches which control the production of this data, and adds compression functionality.

The switches are:

  • --optimizationdata:{none|compress|file}
  • Specify included optimization information. Important for distributed libraries.
  • --interfacedata:{none|compress|file}
  • Include F# interface information, the default is compressed. Essential for distributing libraries.

The options are:

  • none --- include no metadata
  • compress --- compress metadata
  • file --- embed raw file

Compression of this data can cause a large decrease in working set consumed by assemblies at runtime. For example:

  • FSharp.Core
  • Shrinks from 3,002 KB down to 2,118 KB that's a 900K reduction right there.
  • FSharp.Compiler.Service
  • Shrinks from 17,222 KB down to 15,656 KB that's a 900K reduction right there.

Not that this is important but fsc.exe drops from 29K downto 16K, which was kinda unexpected to me at least.

In order to verify the correct behavior throughout of fsharp.core and compiler service being compiled with compressed metadata, this pr currently builds with compression on.

We need to discuss and to decide what to do about the following:

  1. However, we probably can't turn it on for FSharp.Core by default ... projects built with older compilers will not be able to consume an F# Core or any other dll for that matter containing compressed metadata. Currently I propose shipping with non-compressed fsharp.core for 1 year and then switching over to compressed metadata. Although net 7.0 would be a great point to switch over too, that's basically the next release.

  2. I propose from the release of the net 7.0 SDK we compress FSharp.Compiler.Service metadata. The fcs developer community is fairly small and use up to date tools. So hopefully it wouldn't inconvenience them too much.

  3. By default the compiler will embed raw files as it does today. In 1 year change the defaults to embed compressed files.

KevinRansom avatar Aug 11 '22 08:08 KevinRansom

The proposed policy seems spot on

dsyme avatar Aug 11 '22 16:08 dsyme

So, we won't use memory mapped files for in-memory sigdata anymore?

vzarytovskii avatar Aug 13 '22 15:08 vzarytovskii

Also, curious what's the file size difference for FSharp.Core before and after this change?

vzarytovskii avatar Aug 13 '22 16:08 vzarytovskii

3mb before 2.1 after. i can get exact when i get to my desktop.

KevinRansom avatar Aug 13 '22 16:08 KevinRansom

So, we won't use memory mapped files for in-memory sigdata anymore?

Well the file will contain compressed data. once decompressed it can't be memory mapped by definition :-)

KevinRansom avatar Aug 13 '22 16:08 KevinRansom

So, we won't use memory mapped files for in-memory sigdata anymore?

Well the file will contain compressed data. once decompressed it can't be memory mapped by definition :-)

Yeah, but I think we were using it all the way around, instead of storing it entirely in memory, we were using mmaped files (so, pretty much offloading it to the filesystem).

Not that it matters much anyway. Compiler probably might use a bit more memory now. Few megs at the worst.

vzarytovskii avatar Aug 13 '22 16:08 vzarytovskii

@KevinRansom Have you measured the performance changes when reading compressed metadata of a bigger assembly?

@safesparrow This could probably be an interesting case for the performance testing framework you've been working on.

auduchinok avatar Aug 16 '22 18:08 auduchinok

@ dsyme, thanks for the feedback, It never really occurred to me that resources and optimization might be separately required, but I agree it is conceivable, so retaining the existing two switches and adding a compressextrastuff switch seems like a better design.

KevinRansom avatar Aug 17 '22 17:08 KevinRansom

@KevinRansom Have you measured the performance changes when reading compressed metadata of a bigger assembly?

@safesparrow This could probably be an interesting case for the performance testing framework you've been working on.

@auduchinok

To directly answer your question: no

The motivation for adding the feature is for situations where size on disk or download is the dominating concern.

I expect that performance would inform our choice of either defaults or guidance. For now and the next year we are not changing the defaults, although that is over concern for compatibility not performance. That said, this PR does change what we do with the components built in this repo. Notably FCS will now use compressed metadata, we do need to explore the performance implications of this.

My intuition is that :

  • wall clock performance changes will not be noticeable
  • Additional decompression time, should be offset by the having fewer bytes to load
  • The assembly load and this read should already be cached, so I don't expect this to be a repeated activity for a specific assembly.
  • working set will rise slightly by the size of the compressed data on disk.

It should be noted that access of this metadata is only a compile time activity. Applications built using our tools will not use it. Unless they are themselves tools built on FCS to do compilation.

We are certainly open to switching FCS back to uncompressed if there is a noticeable performance degrade, although what we deliver to the SDK will likely remain compressed because there is pressure on size from there for sure, unless of course there is a significant performance degrade.

I hope this clears things up.

KevinRansom avatar Aug 17 '22 17:08 KevinRansom

Have you tested different compression algorithms? There are more alternatives in System.IO.Compression.

Happypig375 avatar Aug 25 '22 02:08 Happypig375