gf-core
gf-core copied to clipboard
LPGF: Linearisation-only PGF format
Introduction
Recently I've been working on resurrecting on old idea, which is adding support for a PGF file format which only supports linearisation, since this is actually quite a common use for GF. The motivations are:
- Faster & less memory-intensive compilation
- Smaller binary files
- Faster linearisation at runtime
- New features impossible with parsing, e.g. dynamic lexicon.
The format itself is described in section 2 of the paper:
"PGF: A Portable Run-Time Format for Type-Theoretical Grammars" Angelov, Bringert, Ranta (2009). http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.640.6330&rep=rep1&type=pdf
(where it is confusingly called "PGF"; what we call "PGF" today is really "PMCFG", section 3 of the same paper).
Progress so far
This draft pull request contains the following:
- An implementation of the LPGF format and runtime (
src/runtime/haskell/LPGF.hs
) which is correct w.r.t. the PGF and PGF2 implementations, with the exception of:- Linearisation of missing functions (low priority)
- Variants, which are intentionally not supported
- Compilation from GF (canonical) to LPGF (
src/compiler/GF/Compiler/GrammarToLPGF.hs
), which can be used in the expected way:gf --make --output-format=lpgf ...
- Test suite with unit-test, Foods, and Phrasebook grammars for testing correctness.
- Benchmark for comparing performance between PGF, PGF2 and LPGF.
Notable ommisions
- The LPGF runtime API needs some cleanup (in particular, one shouldn't need to import PGF to use LPGF).
- The LPGF runtime should at least support type-checking of trees.
- The GF shell doesn't support LPGF. Probably nice to have eventually, but not a priority either.
- Bindings from the [Haskell] LPGF runtime to other languages (or actual implementations in other languages).
Performance
Unfortunately, so far I haven't been able to live up to all the performance goals:
- LPGF files are [often] smaller than PGFs 😄
- Runtime linearisation in LPGF is faster than both PGF and PGF2 🥳
- Compiling to LPGF is at least as slow/memory-consuming as PGF, often significantly worse 😢
So my current focus is on trying to improve the performance of the LPGF compiler, with which I am struggling. I have done what I can with improving the data structures and algorithms used, but I am rather inexperienced with tinkering with strictness and other Haskell performance tuning. If anyone has more expertise in this area then please let me know and I can get more specific about where the bottlenecks are and what I've tried already. Until then, this pull request can remain open and be where any major updates to this project are made.