golang callgraph hn discussion/brainstorm
Hello zomglings,
hbt here from https://news.ycombinator.com/item?id=25007701
I'd like to reply to your comment then explain what I'm planning to build
Your questions
Q:
Would this be a lot easier in go? The case that scares me a bit in Go is implicit imports (e.g. importing a driver for use with
database/sql). But maybe all that is very clear when doing static analysis?
A: There are already utilities for golang and existing apis.
- guru https://github.com/golang/tools/tree/master/cmd/guru
Example of usage: guru -json -scope "github.com/hbt/fh/cmd/hbt-cli" callers "hbt-cli/lib/util/git/git.go:#645"
Example of output:
{
"pos": "/home/hassen/workspace/fh/hbt-cli/lib/cmds/devcmd/gw.go:122:63",
"desc": "static function call",
"caller": "(github.com/hbt/fh/hbt-cli/lib/cmds/devcmd.GitWorkDev).handleCustomProjects"
},
{
"pos": "/home/hassen/workspace/fh/hbt-cli/lib/cmds/devcmd/gw.go:123:63",
"desc": "static function call",
"caller": "(github.com/hbt/fh/hbt-cli/lib/cmds/devcmd.GitWorkDev).handleCustomProjects"
},
{
"pos": "/home/hassen/workspace/fh/hbt-cli/lib/util/goutil/mage.go:71:40",
"desc": "static function call",
"caller": "(github.com/hbt/fh/hbt-cli/lib/util/goutil.MageBuild).BuildLocally"
},
{
"pos": "/home/hassen/workspace/fh/hbt-cli/lib/cmds/go/order.go:32:35",
"desc": "static function call",
"caller": "(github.com/hbt/fh/hbt-cli/lib/cmds/go.GoOrderCmd).Run$1"
},
{
"pos": "/home/hassen/workspace/fh/hbt-cli/lib/cmds/go/impl.go:43:38",
"desc": "static function call",
"caller": "(github.com/hbt/fh/hbt-cli/lib/cmds/go.GoImplementInterfaceCmd).Run"
},
{
"pos": "/home/hassen/workspace/fh/hbt-cli/lib/cmds/selfdev/create.go:61:38",
"desc": "static function call",
"caller": "(github.com/hbt/fh/hbt-cli/lib/cmds/selfdev.CreateCmd).getPackagePath"
}
]
The guru tool uses an existing library to do the AST parsing and provides an API for querying go programs.
callees show possible targets of selected function call
callers show possible callers of selected function
callstack show path from callgraph root to selected function
definition show declaration of selected identifier
describe describe selected syntax: definition, methods, etc
freevars show free variables of selection
implements show 'implements' relation for selected type or method
peers show send/receive corresponding to selected channel op
pointsto show variables the selected pointer may point to
referrers show all refs to entity denoted by selected identifier
what show basic information about the selected syntax node
whicherrs show possible values of the selected error variable
- github codeql -- https://securitylab.github.com/tools/codeql
This is also pretty good for querying AST across multiple languages
- another alternative implementation - https://github.com/ofabry/go-callvis
My goal
My main goal is:
-
provide a better "git diff" prior to committing code and detect the following issues
- changes to the callgraphs that would have been hard to notice (e.g calls that have been affected by change, other modules that were affected by change and was unaware because outside of scope etc.).
- changes to a functions that are "risky" (are used a lot)
- unwanted changes (e.g logging, debug etc.). Easily detectable via callgraph, harder to detect when just text
-
commit metadata and value of a commit: we often say, writing software is about iteration but somehow that doesn't apply to writing a commit and we are expected to get everything right (message, changes within scope etc.)
- provide metadata such as new/modified declarations
- changes to callgraph
- calculate the "value" of a commit. This is more for work/team management. Detecting how often functions are created vs modified, the scope of changes, patterns per author.
Anyway, as soon as I find the bandwidth, I will push out something more concrete.
@hbt: Thanks for this, guru looks nice.
Still digesting your message. My early thoughts:
Metadata
Right now, Locust produces output that looks like this:
{
"initial_ref": "<initial git revision>",
"terminal_ref": "<terminal git revision>",
"locust": [ <array of new or modified function and class definitions in the code> ]
}
We should modify this to look like:
{
"initial_ref": "<initial git revision>",
"terminal_ref": "<terminal git revision>",
"summary": {
"definitions": [ <array from current summary> ],
"dependencies": [ <changes to symbols used in each scope - module/package/file level, as well as at the level of functions, classes> ],
"todos": [ <changed (created, updated, or deleted) todos parsed from comments> ],
"authors": [ <person responsible for changes in each scope> ],
....
}
}
The responsibility of populating this object can lie with plugins. Currently, we have a JS plugin which produces definition changes from Javascript code. This should be extended to populating arbitrary paris of (metadata type, programming language).
Analysis
You mention a few things that fall under "analysis of metadata":
- changes to the callgraphs that would have been hard to notice (e.g calls that have been affected by change, other modules that were affected by change and was unaware because outside of scope etc.).
- changes to a functions that are "risky" (are used a lot)
- unwanted changes (e.g logging, debug etc.). Easily detectable via callgraph, harder to detect when just text
- calculate the "value" of a commit. This is more for work/team management. Detecting how often functions are created vs modified, the scope of changes, patterns per author.
I see these things as use cases locust can support as a metadata provider. Anyone should be able to use Locust metadata as input to programs which produce these outputs.