ast-grep icon indicating copy to clipboard operation
ast-grep copied to clipboard

[feature] Plugin system

Open ribru17 opened this issue 11 months ago • 7 comments

⭐ Suggestion

It would be super helpful to have some sort of plugin system in ast-grep. There are many different ways to think about this, but it could look like:

  • ast-grep defining a trait which plugins must adhere to (maybe it takes in serialized value, produces a match result, list of ranges, etc.)
  • plugins build their own object which implements that trait, and compile it to a shared object
  • ast-grep users can register this plugin, and ast-grep will then load the shared object in and use it to do custom rules/matching

💻 Use Cases

For example, we could hook in an LSP symbols provider to get something like the following:

id: find-calls-to-react-components
language: tsx
rule:
  pattern: $COMPONENT($ARGS)
constraint:
  COMPONENT:
    definitionMatches:  # match against a symbol's *definition*
      pattern: function $NAME() { $$$ }
      has:
        # we assume a function that returns JSX is a component
        pattern: return <$X $$$ />
fix: <$COMPONENT ...$ARGS />

it could allow us to get semantic information about a symbol, rather than just syntactic information

ribru17 avatar Apr 29 '25 19:04 ribru17

Related to

https://github.com/ast-grep/ast-grep/issues/334

https://github.com/ast-grep/ast-grep/issues/433

HerringtonDarkholme avatar May 02 '25 18:05 HerringtonDarkholme

(Just a note: these are similar requests, but still different from this one, which would allow for arbitrary new features depending on the plugin)

ribru17 avatar May 06 '25 16:05 ribru17

Several implementation questions:

  1. how to parse unknown fields? https://github.com/de-vri-es/serde-ignored-fields/
  2. how to register custom field schema?
  3. how to load plugin? wasm? dylib? or builtin?
  4. what data should plugin receive, what outpu

HerringtonDarkholme avatar May 06 '25 16:05 HerringtonDarkholme

  1. Maybe we could do it like constraint, allowing arbitrary field names for children, and their values are always a rule?
    id: find-calls-to-react-components
    language: tsx
    rule:
      pattern: $COMPONENT($ARGS)
    constraint:
      COMPONENT:
        pluginMatchers: # names of plugins, mapped to their rule
          definitionMatches:  # match against a symbol's *definition*
            pattern: function $NAME() { $$$ }
            has:
              # we assume a function that returns JSX is a component
              pattern: return <$X $$$ />
    fix: <$COMPONENT ...$ARGS />
    
  2. This is basically answered by 1., we can apply the same schema style as we do to constraint
  3. I was thinking dylib, but wasm is also an interesting idea. I'd need to do more research on how to do this/what would be best, to be honest
  4. I think as input, if it just received the SgNode(s) from the previous matched rule (in this example, COMPONENT and ARGS this would be enough information to do a lot of stuff). E.g. read node's document range to query for the symbol's definition

ribru17 avatar May 06 '25 16:05 ribru17

+1 to this!

We are building off of ast-grep with a bunch of custom logic (using rust APIs directly) and this could be a great way for us to contribute back.

Relevant links:

  • GitLab Code Parser: https://gitlab.com/groups/gitlab-org/-/epics/17516
  • Ruby Parser example with basic data flow tracing: https://gitlab.com/gitlab-org/rust/gitlab-code-parser/-/merge_requests/1
  • Knowledge Graph (the larger goal): https://gitlab.com/groups/gitlab-org/-/epics/17514

michaelangeloio avatar May 06 '25 17:05 michaelangeloio

I really like the idea of having constraints being extensible via plugins. There are cases when analyzing Angular code where I need to look at another file in order to tell something about some symbol in the current file. For example, an identifier is a component if in another file there's a class with the same name that has a @Component() decorator on it. When analyzing components, we might want to tell whether a certain variable is used in the template of that component or not (defined in a separate file). constraints extensibility would I think make it possible to do that stuff really easily.

For loading, part of me leans towards wasm cause in CI environments trying to compile a dylib and load it has been tricky already (compiling a treesitter grammar to use as a custom language requires setting things up weirdly). extism is a good library for wasm plugin systems. However, wasm would limit native tool access and might limit integrations with things like language-specific tooling.

samwightt avatar May 14 '25 16:05 samwightt

I was thinking dylib, but wasm is also an interesting idea. I'd need to do more research on how to do this/what would be best, to be honest

I think as input, if it just received the SgNode(s) from the previous matched rule (in this example, COMPONENT and ARGS this would be enough information to do a lot of stuff). E.g. read node's document range to query for the symbol's definition

Question number 3 and number 4 are related. Both wasm/dylib needs a stable ABI. Designing the input/output may impact the decision of choosing the runtime of a plugin.

HerringtonDarkholme avatar May 25 '25 01:05 HerringtonDarkholme