cakeml icon indicating copy to clipboard operation
cakeml copied to clipboard

Formalise DEFLATE and add implementation to examples

Open myreen opened this issue 4 years ago • 5 comments

Deflate is a lossless data compression file format that zip and gzip are based on.

The deflate algorithm would be a nice formalisation and verification exercise in HOL. The result of the formalisation effort could be used to construct a verified gzip-like executable using CakeML.

The same compression and decompression algorithm might come in handy inside the CakeML compiler, once the compiler stores bulky constant data structures as read-only data. Such bulky constants could, for example, be the AST for the basis library or the bignum library.

This project is likely to fit the scope of an internship or MSc thesis.

myreen avatar Nov 16 '20 14:11 myreen

Related work:

https://hal.archives-ouvertes.fr/hal-02149909/document

On Mon, Nov 16, 2020 at 8:07 AM myreen [email protected] wrote:

Deflate https://en.wikipedia.org/wiki/DEFLATE is a lossless data compression file format that zip https://en.wikipedia.org/wiki/ZIP_(file_format) and gzip https://en.wikipedia.org/wiki/Gzip are based on.

The deflate algorithm https://tools.ietf.org/html/rfc1951 would be a nice formalisation and verification exercise in HOL. The result of the formalisation effort could be used to construct a verified gzip-like executable using CakeML.

The same compression and decompression algorithm might come in handy inside the CakeML compiler, once the compiler stores bulky constant data structures as read-only data. Such bulky constants could, for example, be the AST for the basis library or the bignum library.

This project is likely to fit the scope of an internship or MSc thesis.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/CakeML/cakeml/issues/806, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIOAD7JSIL7TJM64H4F4YTSQEWZLANCNFSM4TXGSYBA .

konrad-slind avatar Nov 16 '20 15:11 konrad-slind

Work on this could advance in stages:

  1. Implement the compression and decompression algorithms as functions in HOL
  2. Make it easy to prove ∀s. decompress (compress s) = s by making compress test whether decompress works for its output: if not, then it can return its input tagged as uncompressed.
  3. Extensively test that compress won't produce uncompressed data.
  4. Prove that the compress function always produces properly compressed data.

myreen avatar Nov 17 '20 08:11 myreen

Something very similar to the calculation of a Huffman tree given the weights for the alphabet is done in HOL’s examples/computability/kolmog/kraft_ineqScript.sml. (Weights are specified by giving the desired length of the code for each “symbol”.). It should probably be possible to refactor so as to extract the code for the tree-building etc.

mn200 avatar Nov 19 '20 00:11 mn200

I'm going to give this a go for my Honours project.

lxndrcx avatar Feb 01 '21 23:02 lxndrcx

slowly progressing here

lxndrcx avatar Apr 21 '21 09:04 lxndrcx