golang callgraph hn discussion/brainstorm

Open hbt opened this issue 5 years ago • 1 comments

Hello zomglings,

hbt here from https://news.ycombinator.com/item?id=25007701

I'd like to reply to your comment then explain what I'm planning to build

Your questions

Would this be a lot easier in go? The case that scares me a bit in Go is implicit imports (e.g. importing a driver for use with database/sql). But maybe all that is very clear when doing static analysis?

A: There are already utilities for golang and existing apis.

guru https://github.com/golang/tools/tree/master/cmd/guru

Example of usage: guru -json -scope "github.com/hbt/fh/cmd/hbt-cli" callers "hbt-cli/lib/util/git/git.go:#645"

Example of output:

        {                           
                "pos": "/home/hassen/workspace/fh/hbt-cli/lib/cmds/devcmd/gw.go:122:63",                                                          
                "desc": "static function call",                          
                "caller": "(github.com/hbt/fh/hbt-cli/lib/cmds/devcmd.GitWorkDev).handleCustomProjects"                                           
        },                          
        {                           
                "pos": "/home/hassen/workspace/fh/hbt-cli/lib/cmds/devcmd/gw.go:123:63",                                                          
                "desc": "static function call",                          
                "caller": "(github.com/hbt/fh/hbt-cli/lib/cmds/devcmd.GitWorkDev).handleCustomProjects"                                           
        },                          
        {                           
                "pos": "/home/hassen/workspace/fh/hbt-cli/lib/util/goutil/mage.go:71:40",                                                         
                "desc": "static function call",                          
                "caller": "(github.com/hbt/fh/hbt-cli/lib/util/goutil.MageBuild).BuildLocally"                                                    
        },                          
        {                           
                "pos": "/home/hassen/workspace/fh/hbt-cli/lib/cmds/go/order.go:32:35",                                                            
                "desc": "static function call",                          
                "caller": "(github.com/hbt/fh/hbt-cli/lib/cmds/go.GoOrderCmd).Run$1"                                                              
        },                          
        {                           
                "pos": "/home/hassen/workspace/fh/hbt-cli/lib/cmds/go/impl.go:43:38",                                                             
                "desc": "static function call",                          
                "caller": "(github.com/hbt/fh/hbt-cli/lib/cmds/go.GoImplementInterfaceCmd).Run"                                                   
        },                          
        {                           
                "pos": "/home/hassen/workspace/fh/hbt-cli/lib/cmds/selfdev/create.go:61:38",                                                      
                "desc": "static function call",                          
                "caller": "(github.com/hbt/fh/hbt-cli/lib/cmds/selfdev.CreateCmd).getPackagePath"                                                 
        }                           
]

The guru tool uses an existing library to do the AST parsing and provides an API for querying go programs.

  callees         show possible targets of selected function call                                                                           
        callers         show possible callers of selected function                                                                                
        callstack       show path from callgraph root to selected function                                                                        
        definition      show declaration of selected identifier                                                                                   
        describe        describe selected syntax: definition, methods, etc                                                                        
        freevars        show free variables of selection                                                                                          
        implements      show 'implements' relation for selected type or method                                                                    
        peers           show send/receive corresponding to selected channel op                                                                    
        pointsto        show variables the selected pointer may point to                                                                          
        referrers       show all refs to entity denoted by selected identifier                                                                    
        what            show basic information about the selected syntax node                                                                     
        whicherrs       show possible values of the selected error variable

github codeql -- https://securitylab.github.com/tools/codeql

This is also pretty good for querying AST across multiple languages

another alternative implementation - https://github.com/ofabry/go-callvis

My goal

My main goal is:

provide a better "git diff" prior to committing code and detect the following issues
- changes to the callgraphs that would have been hard to notice (e.g calls that have been affected by change, other modules that were affected by change and was unaware because outside of scope etc.).
- changes to a functions that are "risky" (are used a lot)
- unwanted changes (e.g logging, debug etc.). Easily detectable via callgraph, harder to detect when just text
commit metadata and value of a commit: we often say, writing software is about iteration but somehow that doesn't apply to writing a commit and we are expected to get everything right (message, changes within scope etc.)
- provide metadata such as new/modified declarations
- changes to callgraph
- calculate the "value" of a commit. This is more for work/team management. Detecting how often functions are created vs modified, the scope of changes, patterns per author.

Anyway, as soon as I find the bandwidth, I will push out something more concrete.

Nov 06 '20 15:11 hbt

@hbt: Thanks for this, guru looks nice.

Still digesting your message. My early thoughts:

Metadata

Right now, Locust produces output that looks like this:

{
    "initial_ref": "<initial git revision>",
    "terminal_ref": "<terminal git revision>",
    "locust": [ <array of new or modified function and class definitions in the code> ]
}

We should modify this to look like:

{
    "initial_ref": "<initial git revision>",
    "terminal_ref": "<terminal git revision>",
    "summary": {
        "definitions": [ <array from current summary> ],
        "dependencies": [ <changes to symbols used in each scope - module/package/file level, as well as at the level of functions, classes> ],
        "todos": [ <changed (created, updated, or deleted) todos parsed from comments> ],
        "authors": [ <person responsible for changes in each scope> ],
        ....
    }
}

The responsibility of populating this object can lie with plugins. Currently, we have a JS plugin which produces definition changes from Javascript code. This should be extended to populating arbitrary paris of (metadata type, programming language).

Analysis

You mention a few things that fall under "analysis of metadata":

changes to the callgraphs that would have been hard to notice (e.g calls that have been affected by change, other modules that were affected by change and was unaware because outside of scope etc.).
changes to a functions that are "risky" (are used a lot)
unwanted changes (e.g logging, debug etc.). Easily detectable via callgraph, harder to detect when just text
calculate the "value" of a commit. This is more for work/team management. Detecting how often functions are created vs modified, the scope of changes, patterns per author.

I see these things as use cases locust can support as a metadata provider. Anyone should be able to use Locust metadata as input to programs which produce these outputs.

Nov 06 '20 22:11 zomglings