blog icon indicating copy to clipboard operation
blog copied to clipboard

[PROPOSAL] Source code transformations

Open bzz opened this issue 7 years ago • 12 comments

Idea comes from https://github.com/src-d/blog/pull/233#discussion_r209758185

  • Title: Source code match/traverse/transform APIs
  • Author(s): Alex, ?
  • Short description: An overview of existing approaches to match/traverse/transform source code
  • Categories: language analysis
  • Deadlines: no

Table of contents

Very rough grounds that would be covered

OSS:

  • Golang: go fix/go fmt -r
  • Cpp: clang-tidy
  • C: coccinelle
  • Java: JTransformer
  • Example-based refactorings: Java: error-prone/ Golang: eg
  • Python: Bowler

Proprietary/from talks or papers (material)

with some basic examples of using each API, and conclusion why bblfsh is the best tool for us.

Management

This section will be filled by @campoy.

  • State: (proposed | writing | written | published)
  • Scheduled:
  • Link to post:

Social Media

  • Wording for tweet:
  • Hashtags:
  • Subreddits:

NOTE Please write in short lines so the review is easier to do.


Preliminary content comes from prev. blog pos

{{% center %}} … {{% /center %}}

## Technical details

Based on the internal success-story of C++ with [ClangMR tool](https://research.google.com/pubs/pub41342.html)
for matching/traversing/transforming Abstract Syntax Tree (AST) at scale, a similar tooling was built for Java.

{{% youtube ZpvvmvITOrk %}}

Project [Error-Prone](https://github.com/google/error-prone) is a compiler extension that is able to perform
arbitrary analysis on the *fully typed AST*. One thing to notice is that one can not get such input by using
only a parser even as advanced as [babelfish](https://doc.bblf.sh/). Running a full build would be
required in order to do things like symbol resolution. In the end, after running a number of checker plugins
Error-Prone outputs simple text replacements with suggested code fixes.

The project is open source and is well documented in a [number](https://research.google.com/pubs/pub38275.html)
of [papers](https://research.google.com/pubs/pub41876.html). Another closed source tool was built to scale
application of those fixes to the whole codebase, called JavacFlume — which I would guess looks something like
an Apache Spark job that applies patches in some generic format.

Here is an example of how a full pipeline looks for C++:

{{% grid %}}
{{% caption src="https://cdn-images-1.medium.com/max/4224/1*KpJ5fj4njR1HTDfzhLCQkg.png" title="ClangMR processing pipeline ilustration"%}}
“Large-Scale Automated Refactoring Using ClangMR” by
[Hyrum Wright](https://research.google.com/pubs/HyrumWright.html), Daniel Jasper, Manuel Klimek, [Chandler Carruth](https://research.google.com/pubs/ChandlerCarruth.html), Zhanyong Wan
{{% /caption %}}
{{% /grid %}}

Although it is not disclosed, an attentive reader might have noticed that **Compilation Index** part of the
pipeline is very similar to a [Compilation Database](https://kythe.io/docs/kythe-compilation-database.html)
in the open source Kythe project.

It might be interesting to take a closer look at the example of an API for AST query and transformation for C++.

### C++ Example
> *rename all calls to Foo::Bar with 1 argument to Foo::Baz, independent of the name of the instance variable,
> or whether it is called directly or by pointer or reference*

{{% grid %}}
{{% grid-cell %}}
![API example: invoke a callback function on call to Foo:Bar](https://cdn-images-1.medium.com/max/2000/1*vOYemTlJ2QZyzXvizSy5Og.png)
{{% /grid-cell %}}
{{% grid-cell %}}
This fragment will invoke a callback function on any occurrence of the call to *Foo:Bar* with single argument.
{{% /grid-cell %}}
{{% /grid %}}

{{% grid %}}
{{% grid-cell %}}
![API example: replace matching text of the function name with the "Baz"](https://cdn-images-1.medium.com/max/2116/1*JiUgO-gimsIi2JpRB9LYeg.png)
{{% /grid-cell %}}
{{% grid-cell %}}
This callback will generate a code transformation: for the matched nodes it will replace the matching text of
the function name with the “Baz”.

Regarding code transformations in Java, **Error-Prone** has a similar low-level [patching API](http://errorprone.info/docs/patching)
that is very close to native AST manipulation API. It was found to have a steep learning curve similar to the
Clang, and thus pose a high entry barrier — even an experienced engineer would need few weeks before one can be
productive creating fix suggestions or refactorings.
{{% /grid-cell %}}
{{% /grid %}}

That is why a higher level API was built for Java: first as the separate [Refaster](https://research.google.com/pubs/pub41876.html)
project and then [integrated into Error-Prone](http://errorprone.info/docs/refaster) later.

So a usual workflow would look like — after running all the checks and emitting a collection of suggested
fixes, shard diffs to smaller patches, run all the tests over the changes and if they have passed, submit
patches for code review.

{{% center %}} … {{% /center %}}

{{% center %}}
##### Thank you for reading, stay tuned and keep you codebase healthy!
{{% /center %}}

bzz avatar Aug 28 '18 08:08 bzz

Hey Alex, maybe I'm lacking knowledge here but the title doesn't mean anything to me. Could you make it more beginner friendly?

campoy avatar Oct 12 '18 00:10 campoy

Thank you for feedback, Francesc! It's totally WIP as I'm just gaining confidence in existing tools in this field.

The plan is basically to cover some "state of the art" tools for AST transformation (AKA refactoring), so the learnings could be applied to Bblfsh UAST manipulation API.

How about the title along the lines of "Source code transformations"?

OSS:

  • Golang: go fix/go fmt -r
  • Cpp: clang-tidy
  • C: coccinelle
  • Java: JTransformer
  • Example-based refactorings: Java: error-prone/ Golang: eg
  • Python: Bowler, python google/pasta
  • Multilanguage https://comby.dev

Proprietary/from talks or papers (material)

  • ClangMR/JavacFlume
  • Semmle QL (only query)

bzz avatar Oct 24 '18 16:10 bzz

Source code transformations makes it much more clear to me, yeah. Let me know when you have a draft of the blog so I can review.

I'd be curious to see if we can make it so the blog doesn't feel like a series of tools, and instead there's a story tying everything up.

campoy avatar Oct 24 '18 20:10 campoy

I'd be curious to see if we can make it so the blog doesn't feel like a series of tools, and instead there's a story tying everything up.

that is very useful feedback, thank you and please let me think more about that. I would expect that even initial draft will take some time though - but will post it here asap.

Thanks again.

bzz avatar Oct 25 '18 08:10 bzz

@campoy One story I can think of is:

take simple-but-educational example(s) of some issue in the code as a motivation, and then go though implementing:

  • a code to detect it
  • a code to suggest a fix for it in each of those systems.

Due to differences in host languages it could be hard to pick a single example, so it can be adjusted a bit for each specific language, keeping it sufficiently high-level.

A Nice 🍒 on top could be finishing it with the link to a blog post on "how to wrap it as a lookout analyzer" from #249 .

WDYT?

bzz avatar Nov 12 '18 19:11 bzz

I like it, even if we find an example that only works for a specific language it should be easy to get people from other language communities understand the point of the article.

campoy avatar Nov 14 '18 00:11 campoy

Refactoring prolog code: https://pdfs.semanticscholar.org/b48b/bc30427ef7429db83e190f91a579442121b6.pdf

kuba-- avatar Nov 29 '18 18:11 kuba--

@bzz did you get a chance to start a draft ?

vcoisne avatar Nov 30 '18 22:11 vcoisne

Very preliminary - this is fairly ambitious and requires a lot of research. I would expect a shareble draft early next year.

bzz avatar Dec 03 '18 19:12 bzz

@bzz Trying to plan our blog schedule for the upcoming weeks. Did you get a chance to work on this draft ?

vcoisne avatar Jan 18 '19 18:01 vcoisne

@vcoisne did some progress on research but not there yet. I will ping you as soon as have some results to share!

bzz avatar Jan 24 '19 09:01 bzz

This is still in my backlog.

Two more interesting contenders added to the description:

  • https://github.com/google/pasta for python
  • https://comby.dev for assembly, Bash, C/C++, C#, Clojure, CSS, Dart, Elm, Elixir, Erlang, Fortran, F#, Go, Haskell, HTML/XML, Java, Javascript/Typescript, JSON, Julia, LaTeX, Lisp, OCaml, Pascal, PHP, Python, Ruby, Rust, Scala, SQL, Swift, Text

bzz avatar Oct 29 '19 20:10 bzz