gf-core
gf-core copied to clipboard
Use a GF grammar for printing GF grammars
Most (all?) output formats can be described as a concrete syntax of a GF grammar.
This relates to issue #26 , and should make it simpler to add and modify different output formats.
Idea: write one GF grammar for canonical GF (ie what’s left after compilation and partial evaluation), and output formats such a Haskell, Javascript, Json, Yaml etc as concrete syntaxes. The same can be done for outputting PGF/PMCFG grammars. And possibly the same for speech grammars (which need some grammar transformations such as left-recursion elimination)
This would affect the modules GF.Compiler, GF.Compile.Export, and others. In fact, several modules can probably be removed (such as GF.Compile.PGFtoXxx)
One extra feature would be that we don’t have to recompile GF to add a new output format. So, anyone can create their own :)
This sounds like a good idea with one clarification.
Every time when I make major changes in the compiler or the runtime, I need to update a lot of things. This includes the different exports to Haskell, Python, Prolog, etc. I believe that many of those exports are not used by anyone. When I update them I make sure that they compile but whenever necessary I might have to change the export format and this has happened several times in the last few years. Yet, no one has complained that this has broken his/her code. Therefore I believe that they were one time experiment and after that they were not used further.
Moving these exports out of the compiler is a good idea. There should be an abstract syntax defined somewhere in another repository. The only responsibility of the compiler would be to produce one abstract syntax tree representing the GF grammar. When necessary the compiler maintainers should also update the abstract syntax for describing GF grammars.
The maintenance of the export formats should be left to the corresponding maintainers who are the only one who know how the format should be used.
That's exactly what I meant. I think.
Also: all of the current export formats (except Haskell concrete syntax) are on the PGF grammar, but it's sometimes more useful to print the compiled GF grammar instead. So we need two GF grammars, and people can decide if they want to render the GF or the PGF.
I made a very quick abstract syntax for canonical GF. I.e., what's left after compilation and partial evaluation, or what batchCompile is returning here:
https://github.com/GrammaticalFramework/gf-core/blob/f32d222e7120b2cdbcf7959f2230d01588ee1aa0/src/compiler/GF/Compiler.hs#L47-L52
I know that I'm skipping over esoteric things such as lindef, data, def, printname, etc., and I have probably missed a lot of things, and there are probably better ways of doing it (ping @Thomas-H @krangelov @aarneranta @johnjcamilleri @inariksit et al). But here's anyway:
abstract GFCanonical = {
cat
Grammar ; Abstract ; Concrete ; [Concrete]{0} ;
CatDef ; [CatDef]{0} ;
FunDef ; [FunDef]{0} ;
SimpleType ; ComplexType ; [ComplexType]{0} ;
TypeApplication ; TypeBinding ; [TypeBinding]{0} ;
ParamDef ; [ParamDef]{0} ;
ParamValue ; [ParamValue]{0} ; ParamType ;
LincatDef ; [LincatDef]{0} ;
LinType ; [LinType]{0} ;
RecordRowType ; [RecordRowType]{0} ;
LinDef ; [LinDef]{0} ;
LinValue ; [LinValue]{0} ;
TableRowValue ; [TableRowValue]{0} ;
RecordRowValue ; [RecordRowValue]{0} ;
CatId ; [CatId]{0} ;
FunId ;
ParamId ; [ParamId]{0} ;
ValueId ;
LabelId ;
VarId ; [VarId]{0} ;
fun
grammar : Abstract -> [Concrete] -> Grammar ;
abs : [CatDef] -> [FunDef] -> Abstract ;
cnc : [ParamDef] -> [LincatDef] -> [LinDef] -> Concrete ;
-- abstract category declarations
simpleCatDef : CatId -> CatDef ;
complexCatDef : CatId -> [CatId] -> CatDef ;
-- abstract function declarations
simpleFunDef : FunId -> SimpleType -> FunDef ;
complexFunDef : FunId -> ComplexType -> FunDef ;
simpleType : [CatId] -> CatId -> SimpleType ;
complexType : [TypeBinding] -> TypeApplication -> ComplexType ;
nobinding : ComplexType -> TypeBinding ;
binding : VarId -> ComplexType -> TypeBinding ;
noapplication : CatId -> TypeApplication ;
application : CatId -> [ComplexType] -> TypeApplication ;
-- concrete param declarations
paramDef : ParamId -> [ParamValue] -> ParamDef ;
simpleParamValue : ParamId -> ParamValue ;
complexParamValue : ParamId -> [ParamId] -> ParamValue ;
-- concrete lincat definitions
lincatDef : CatId -> LinType -> LincatDef ;
strType, intType, floatType : LinType ;
paramType : ParamType -> LinType ;
tableType : ParamType -> LinType -> LinType ;
recordType : [RecordRowType] -> LinType ;
tupleType : [LinType] -> LinType ;
recordRowType : LabelId -> LinType -> RecordRowType ;
-- concrete linearisation definitions
linDef : FunId -> [VarId] -> LinValue -> LinDef ;
strConstant : String -> LinValue ;
intConstant : Int -> LinValue ;
floatConstant : Float -> LinValue ;
paramConstant : ParamValue -> LinValue ;
tableValue : [TableRowValue] -> LinValue ;
recordValue : [RecordRowValue] -> LinValue ;
tupleValue : [LinValue] -> LinValue ;
tableRowValue : ParamValue -> LinValue -> TableRowValue ;
recordRowValue : LabelId -> LinValue -> RecordRowValue ;
-- identifiers
catid : String -> CatId ;
funid : String -> FunId ;
paramid : String -> ParamId ;
valueid : String -> ValueId ;
labelid : String -> LabelId ;
varid : String -> VarId ;
anonymous : VarId ;
}