CompCert
CompCert copied to clipboard
RFC: stable memory block names for globals
These changes introduce a more structured namespace for memory blocks: a block id can either be a global ident
(new), or a dynamically allocated positive
(as before). Using this approach, we can ensure that global environments always assign the same block name to a global identifier id
, so that for all ge1
, ge2
we have:
Genv.find_symbol ge1 id = Some b1 ->
Genv.find_symbol ge2 id = Some b2 ->
b1 = b2.
Rationale
In all work around compositional semantics, we need to ensure that the memory blocks for globals are named consistently across modules. For instance,
- In CompCertX, we force programs to follow a predetermined structure so that the same identifiers will always be mapped to the same block ids.
- Compositional CompCert assumes that the global environments of linked modules are consistent, arguing that we can always add external definitions to ensure that's the case.
Our changes should make it possible to eliminate these hacks/assumptions.
Status
We have updated all of CompCert v3.2 to use this new naming scheme.
We plan to use these changes in our own ongoing work on compositional compilation, and we may tweak them further over the coming weeks. However, since these changes have a broader potential applicability and we're hoping for them to be merged into CompCert at some point, we want to collect some feedback as we go forward with that plan.
Notes:
- The new namespace is specified as an abstract
Module Type
, making it easy to explore richer namespace structures in the future. - In Initializers.v, it is no longer necessary for
constval
/transl_init
to abuseVptr
to store the global identifier of referenced variables, since it is now possible to use a pointer value that makes sense. - Since global pointers are preserved by compilation, it is now possible to use
val
directly in events, as demonstrated by ourglobmem-eventval
branch, which defineseventval := val
. - With the uniform block naming convention, it should be possible to define a linker for global environments corresponding to the existing linker for programs, and to relate new notions of semantic composition to the already implemented syntactic composition.
Known remaining issue:
- Our implementation of
BMap
usingEMap
may be problematic for the performance of the interpreter. This can be easily fixed with some more work. - Since block names for globals are predetermined, it might be advantageous to replace
find_symbol
with a simplerhas_symbol
boolean predicate (for example), or remove it entirely.
Thanks for the proposal. I haven't had time yet to study the diff but will do soon. @andrew-appel, @jhjourdan, @robbertkrebbers, any opinions on this proposal?
We will discuss this in the VST group and get back to you.
Today we (@andrew-appel, @lennartberinger, @scuellar, @jeremie-koenig, @pwilke) discussed this by phone. We agree that the proposal addresses part of the problem, but does not address the problem of consistent (across compilation units) mapping of ASCII strings ("names") to AST.ident. We believe that very possibly this pull request could be a useful part of the solution to that larger problem, but we are still studying and discussing it.
One thing @pwilke and I did not think to mention this during the call: in this pull request, the ASCII names <-> AST.ident
mapping does in fact show up, because the interpreter needs to be able to print out pointers:
- In
BlockNames.v
we use an axiomident_to_string: ident -> string
to implement aBlock.to_string : block -> string
function; - In
extraction.v
we realize this axiom as(fun i -> Camlcoq.coqstring_of_camlstring (Camlcoq.extern_atom i))
; - In
Interp.ml
we useBlock.to_string
in the definition offormat_value
.
This allows us to print out a global pointer &globvar[12]
as <glob:globvar+48>
, and a dynamic one &localvar
as <dyn:123+0>
(say). Previously both would have been printed out using numerical block-ids. We could move the definition of ident_to_string
to a more central place if we want to make that a more general feature or use it in other ways.
This does not really address the problem mentioned by @andrew-appel, which I understand as: What if I want to use strings that never went through the parser, therefore are not in the map? This happens anytime you use CompCert as a library rather than a self-contained compiler: in VST, when you use the output of clightgen
; likewise in CertiKOS, our "source-ish" code is Clight/Asm ASTs in .v files.
VST/CertiKOS use things like:
Definition foo : ident := 1%positive.
Definition bar : ident := 2%positive.
...
Definition myast : function := ... foo ... bar ...
Instead, we would want to use something like:
Definition foo : ident := ident_of_string "foo".
Definition bar : ident := ident_of_string "bar",
or perhaps even declare ident_of_string
as a coercion and use strings directly when defining myast
. This way we would never again need to mess around with potentially inconsistent tables of names <-> ident mappings in VST and CertiKOS, and there would never be a problem putting together modules that have been developped independently.
I believe we can accomplish this in the following way:
In the Coq part of CompCert, we axiomatize ident_to_string
and string_to_ident
as a predefined, but unspecified bijection between strings and ident's. Such a mapping does exists, but any particular one will be inefficient in general (the strings we do happen to use in a given case will correpond to sparsely distributed, large numbers).
But since we're free to use any such mapping, in the ML part, we can simply implement string_to_ident
using a stateful process that allocates small numbers on-demand (as is already done). The key thing here is that although string_to_ident
will be effectful, this can never be observed from the outside. In effect, we progressively "discover" the specific mapping we want to use.
Using this approach, we don't need to deal with the statefulness of the table in Coq, but still get the benefits. I believe implementing this can be done independently of our changes to the block namespace.
Here's a proof-of-concept https://github.com/CertiKOS/compcert/commit/85d2b87ccd1202bf9041abf662bce0768fcc7349
At this point I thought I'd write some kind of "diff walk-through" to facilitate any digging people would want to do.
The main changes occur in:
-
common/AST.v declares the
string_of_ident
parameter (realized together withstring_of_positive
with Caml code inextraction.v
). This is needed in BlockNames.v to print out block names as strings. -
common/BlockNames.v is a new file which defines the interface and implementation of the new block namespace. It includes an empirical eauto-based
blomega
tactic that can discharge most of the inequalities about blocks thatxomega
used to handle. -
common/Values.v now defines
block := Block.t
, rather thanpositive
. -
common/Memtype.v is modified to reflect changes to the memory model interface:
- There is a new invariant that
Block.init <= nextblock m
. - There is a new operation
alloc_at
used for allocating global blocks by name. -
inject_neutral
no longer needs a threshold (Block.init
is always what we want).
- There is a new invariant that
-
common/Memory.v implements those changes.
- Note that memory states now use
BMap
rather thanPMap
. In the most recent versionBMap
is implemented usingIMap
through an embedding of the newblock
type intopositive
, meaning the performance implications for the interpreter should be minimal.
- Note that memory states now use
-
common/Globalenvs.v is somewhat simplified:
-
genv_next
is no longer necessary (the constantBlock.init
now plays this role) -
genv_defs
is now indexed byAST.ident
values rather thanblock
- the remaining maps and invariants in
Genv.t
are no longer necessary (they can be simple definitions and lemmas based ongenv_defs
and block operations); onlygenv_public
remains -
init_mem
is modified to usealloc_at
-
Less remarkable changes are:
-
common/Maps.v: make sure the
MAP
interface has everything needed ofBMap
, and make sureEMap
andIMap
implement it. -
cfrontend/Initializers(proof).v has one hack pleasantly removed, whereby
ident
values would be stored asblock
in pointers (now stored asBlock.glob i
). -
driver/Interp.ml now uses
block_compare
in state comparisons, andBlock.to_string
to print out pointers.
The rest of the patch consists of replacing positive
operators, predicates, lemmas, and tactics by their new block
equivalents --- a set of numerous but fairly trivial changes. The ~550 new lines introduced by the patch mainly are BlockNames.v
and the new Mem.alloc_at
.