psychec
                                
                                 psychec copied to clipboard
                                
                                    psychec copied to clipboard
                            
                            
                            
                        A compiler frontend for the C programming language
Psyche-C
Psyche is a rather unique compiler frontend for the C programming language that is specifically designed for the implementation of static analysis tools. This is where the "uniqueness" of Psyche-C comes from:
- Clean separation between the syntactic and semantic compiler phases.
- Algorithmic and heuristic syntax disambiguation.
- Type inference of missing struct,union,enum, andtypedef
 (i.e., tolerance and "recovery" against#includefailures).
- API inspired by that of the Roslyn .NET compiler.
- AST resembling that of the LLVM's Clang frontend.
Library and API
Psyche-C is implemented as a library. Its native API is in C++ (APIs for other languages are on the way).
void analyse(const SourceText& srcText, const FileInfo& fi)
{
    ParseOptions parseOpts;
    parseOpts.setTreatmentOfAmbiguities(ParseOptions::TreatmentOfAmbiguities::DisambiguateAlgorithmically);
    
    auto tree = SyntaxTree::parseText(srcText,
                                      TextPreprocessingState::Preprocessed,
                                      TextCompleteness::Fragment,
                                      parseOpts,
                                      fi.fileName());
    auto compilation = Compilation::create("code-analysis");
    compilation->addSyntaxTree(tree.get());
    AnalysisVisitor analysis(tree.get(), compilation->semanticModel(tree.get()));
    analysis.run(tree->translationUnitRoot());
}
SyntaxVisitor::Action AnalysisVisitor::visitFunctionDefinition(const FunctionDefinitionSyntax* node) override
{
    const sym = semaModel->declaredSymbol(node);
    if (sym->kind() == SymbolKind::Function) {
        const FunctionSymbol* funSym = sym->asFunction();
        // ...
    }
    return Action::Skip;
}
The cnippet Driver
Psyche-C comes with the cnippet driver so that it can also be used as an ordinary C parser.
void f()
{
    int ;
}
If you "compile" the snippet above with cnippet, you'll see a diagnostic similar/equal to what you would see with GCC or Clang.
~ cnip test.c
test.c:4:4 error: declaration does not declare anything
int ;
    ^
NOTE: Semantic analysis isn't yet complete.
Type Inference
Psyche-C can infer the missing types of a code snippet (a.k.a. as an incomplete program or program fragment).
void f()
{
    T v = 0;
    v->value = 42;
    v->next = v;
}
If you compile the snippet above with GCC or Clang, you'll see a diagnostic such as "declaration forTis not available".
With cnippet, "compilation" succeeds, as the following definitions are (implicitly) synthesised.
typedef struct TYPE_2__ TYPE_1__;
struct TYPE_2__ 
{
    int value;
    struct TYPE_2__* next;
} ;
typedef TYPE_1__* T;
These are a few application of type inference for C:
- Enabling, on incomplete source-code, static analysis techniques that require fully-typed programs.
- Compiling partial code (e.g., a snippet retrieved from a bug tracker) for object-code inspection.
- Generating test-input data for a function in isolation (without its dependencies).
- Quick prototyping of an algorithm, without the need of explicit types.
NOTE: Type inference isn't yet available on master, only in the original branch.
Documentation and Resources
- The Doxygen-generated API.
- A contributor's wiki.
- An online interface that offers a glimpse of Psyche-C's type inference functionality.
- Articles/blogs:
Building and Testing
Except for type inference, which is written in Haskell, Psyche-C is written in C++17; cnippet is written in Python 3.
To build:
cmake CMakeLists.txt && make -j 4
To run the tests:
./test-suite
Related Publications
- 
Type Inference for C: Applications to the Static Analysis of Incomplete Programs 
 ACM Transactions on Programming Languages and Systems — TOPLAS, Volume 42, Issue 3, Article No. 15, Dec. 2020.
- 
Inference of static semantics for incomplete C programs 
 Proceedings of the ACM on Programming Languages, Volume 2, Issue POPL, Jan. 2018, Article No. 29.
- 
AnghaBench: a Suite with One Million Compilable C Benchmarks for Code-Size Reduction 
 Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization — CGO, 2021.
- 
Generation of in-bounds inputs for arrays in memory-unsafe languages 
 Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization — CGO, Feb. 2019, p. 136-148.
- 
Automatic annotation of tasks in structured code 
 Proceedings of the International Conference on Parallel Architectures and Compilation Techniques — PACT, Nov. 2018, Article No. 31.