MATE icon indicating copy to clipboard operation
MATE copied to clipboard

Document CPG schema guidelines

Open langston-barrett opened this issue 1 year ago • 0 comments

We repeatedly have questions like these, which I haven't thought through the answers to thoroughly enough. I'd like to take the time to consider them more deeply, list pros and cons, record our thoughts and conclusions, and enact them by changing the schema where necessary.

  • [ ] What direction should AST edges go? (InstructionToParentBlock, or BlockToChildInstruction)
  • [ ] Sometimes, we have edges that represent relationships that conceptually could hold between many node types (especially in ASTs), like InstructionToParentBlock. Should we use the same edge kind and name for all of these, or separate edge kinds depending on the endpoints (e.g. one kind for LLVM, one for ASM, etc.)?
  • [ ] When we have two analyses that provide the same sort of information, but with varying precision or other factors, like the LLVM and Datalog pointer analyses, should we use the same or different edge kinds?
  • [ ] How should the structure of the CPG reflect superclass-subclass relations, e.g. those that appear in LLVM? @scott and I identified four possibilities here: https://gitlab-ext.galois.com/mate/MATE/-/merge_requests/499

Additionally, the following are guidelines that we already follow and we could document:

  • [x] The LLVM AST is primary, so node and edge kinds in this AST are un-prefixed, e.g., Function and Instruction refer to LLVM-level functions and instructions.

Migrated from internal (Gitlab) MATE issue number(s) 550

langston-barrett avatar Aug 23 '22 19:08 langston-barrett