[feature] Plugin system
⭐ Suggestion
It would be super helpful to have some sort of plugin system in ast-grep. There are many different ways to think about this, but it could look like:
- ast-grep defining a trait which plugins must adhere to (maybe it takes in serialized value, produces a match result, list of ranges, etc.)
- plugins build their own object which implements that trait, and compile it to a shared object
- ast-grep users can register this plugin, and ast-grep will then load the shared object in and use it to do custom rules/matching
💻 Use Cases
For example, we could hook in an LSP symbols provider to get something like the following:
id: find-calls-to-react-components
language: tsx
rule:
pattern: $COMPONENT($ARGS)
constraint:
COMPONENT:
definitionMatches: # match against a symbol's *definition*
pattern: function $NAME() { $$$ }
has:
# we assume a function that returns JSX is a component
pattern: return <$X $$$ />
fix: <$COMPONENT ...$ARGS />
it could allow us to get semantic information about a symbol, rather than just syntactic information
Related to
https://github.com/ast-grep/ast-grep/issues/334
https://github.com/ast-grep/ast-grep/issues/433
(Just a note: these are similar requests, but still different from this one, which would allow for arbitrary new features depending on the plugin)
Several implementation questions:
- how to parse unknown fields? https://github.com/de-vri-es/serde-ignored-fields/
- how to register custom field schema?
- how to load plugin? wasm? dylib? or builtin?
- what data should plugin receive, what outpu
- Maybe we could do it like
constraint, allowing arbitrary field names for children, and their values are always a rule?id: find-calls-to-react-components language: tsx rule: pattern: $COMPONENT($ARGS) constraint: COMPONENT: pluginMatchers: # names of plugins, mapped to their rule definitionMatches: # match against a symbol's *definition* pattern: function $NAME() { $$$ } has: # we assume a function that returns JSX is a component pattern: return <$X $$$ /> fix: <$COMPONENT ...$ARGS /> - This is basically answered by 1., we can apply the same schema style as we do to
constraint - I was thinking dylib, but wasm is also an interesting idea. I'd need to do more research on how to do this/what would be best, to be honest
- I think as input, if it just received the SgNode(s) from the previous matched rule (in this example,
COMPONENTandARGSthis would be enough information to do a lot of stuff). E.g. read node's document range to query for the symbol's definition
+1 to this!
We are building off of ast-grep with a bunch of custom logic (using rust APIs directly) and this could be a great way for us to contribute back.
Relevant links:
- GitLab Code Parser: https://gitlab.com/groups/gitlab-org/-/epics/17516
- Ruby Parser example with basic data flow tracing: https://gitlab.com/gitlab-org/rust/gitlab-code-parser/-/merge_requests/1
- Knowledge Graph (the larger goal): https://gitlab.com/groups/gitlab-org/-/epics/17514
I really like the idea of having constraints being extensible via plugins. There are cases when analyzing Angular code where I need to look at another file in order to tell something about some symbol in the current file. For example, an identifier is a component if in another file there's a class with the same name that has a @Component() decorator on it. When analyzing components, we might want to tell whether a certain variable is used in the template of that component or not (defined in a separate file). constraints extensibility would I think make it possible to do that stuff really easily.
For loading, part of me leans towards wasm cause in CI environments trying to compile a dylib and load it has been tricky already (compiling a treesitter grammar to use as a custom language requires setting things up weirdly). extism is a good library for wasm plugin systems. However, wasm would limit native tool access and might limit integrations with things like language-specific tooling.
I was thinking dylib, but wasm is also an interesting idea. I'd need to do more research on how to do this/what would be best, to be honest
I think as input, if it just received the SgNode(s) from the previous matched rule (in this example, COMPONENT and ARGS this would be enough information to do a lot of stuff). E.g. read node's document range to query for the symbol's definition
Question number 3 and number 4 are related. Both wasm/dylib needs a stable ABI. Designing the input/output may impact the decision of choosing the runtime of a plugin.