refactor-nrepl Outline work needed for cljs and cljc support

I'd like to use this issue to start a discussion on what needs to be done in order to add support for cljs and cljc.

Most of what I consider to be the important functionality involves the analyzer and is unavailable outside of clj files.

I'm positive others would like to improve this state too, but for someone who's not familiar with the project trying to tackle these issues seems a bit hopeless.

It would be helpful to start outlining what the current problems are, what solutions could work and lay a path for people who are interested in working on this but don't know where to start.

Some questions I currently have in mind:

Why does the current approach cannot work for cljs? This is not specific to refactor-nrepl, but some pointers on how exactly does the analysis process differ between clj and cljs would make a good resource.
In principle, are all the existing middlewares applicable to a cljs ast? It is unclear if there are platform-specific blockers that would prevent this from happening.
What are some potential solutions for implementing the existing functionality in cljs? Some high level on how the cljs runtime could integrate with the rest of the middleware.
Are there any potential problems with having to deal with two distinct analysis results?

Cheers!

Apr 15 '17 13:04 julienfantin

awesome you opened this. let me collect my thoughts and previous discussions around this before answering in detail

Apr 15 '17 13:04 benedekfazekas

What strategies could be used for testing? How do we prevent regressions and assert that any given feature works in all three contexts? There seems to be a strategy in place already with rest/resources but it's not clear whether this would scale? Also the current model could use some documentation if there is indeed a method to it.

Apr 15 '17 14:04 julienfantin

Just after a bit of analysis the list of refactor-nrepl functions using ASTs and the list of cljr features using these refactor-nrepl functions.

refactor-nrepl.find.find-locals
- promote fn
- extract function
- inline symbol
- find usages/rename symbol
- create-fn from example
- change fn signature
refactor-nrepl.find.find-symbol
- find local and global sybmol
- rename local and global symbol
refactor-nrepl.find.find-used-publics
- replace refer all

and now your questions

Why does the current approach cannot work for cljs? This is not specific to refactor-nrepl, but some pointers on how exactly does the analysis process differ between clj and cljs would make a good resource.

Essentially the current approach does not work for cljs because the analyzer we've choosen (clojure contrib analyzer: org.clojure/tools.analyzer) does not have a working cljs/js implementation. The existing cljs/js implementation has been abandoned due to lack of time to keep up with cljs, see https://github.com/clojure/tools.analyzer.js#this-project-has-been-abandoned.

So as simple as that we decided to "buy" an analyzer solution and build features on top of it expecting the cljs part working too which is not the case unfortunately.

There is one exception, clean-ns which uses its own means to gather information about the namespace it works on. It does not use the AST and supports clj, cljs and cljc.

This answers your 2nd question too I suppose. In the context of clojure.contrib analyzer there is no such thing as cljs AST at the moment unfortunately.

What are some potential solutions for implementing the existing functionality in cljs? Some high level on how the cljs runtime could integrate with the rest of the middleware.

In short find an other solution the get the necessary information about the clj/cljs code we work with. Apart from the missing cljs story of the currently used analyzer there are other problems with it. A good discussion of these problems by some much brighter ppl than myself can be found here: #134.

The above discussion touches on a load of things so perhaps a very careful thinking over all those points raised could be benefitial. Might raise new questions etc.

(I guess it would be also possible to pick up the maintenance of org.clojre/tools.analyzer.js No idea how much work is that but I am guessing quite a big amount. Also because of the other problems of the tools.analyzer in the context of tooling I would not prefer this way.)

A bit about options on potential solutions

One option is to run our own minimalistic analyzer. clean-ns does this in a very very limited sense. It collects all symbols in a file using tools.reader, see details in the refactor-nrepl.find.symbols-in-file namespace. We could go down this route and build our own minimal ASTs collecting only the info we need. This is in a way what cursive does.

As the cljs story is really getting more and more interesting these days we might have a look on self hosted cljs repls like lumo and planck if they could help us out for cljs.

Also joker looks very interesting. If I understand the impliciations there it should support both clj and cljs. Also as implemented in go it does not need a running REPL. On the flip side tho there are limitations and it is mentioned in the project's non-goals section that feature parity with clojure is not in scope of the project. this may bite us later if we go down this way. we should also consider the fact that going this way would break the architecture compatibility between cider and clj-refactor/refactor-nrepl. this could be a good thing in the sense if this works out that might be a way for cider to go as well in the future (talking about a really huge rewrite now). but also it would very likely defeat my goal of the past year of merging cljr gradually into cider.

disclaimer: although i am very excited to do a POC with joker i have very limited time nowadays to hack on these things unfortunately.

disclaimer: this is more like a vision like thing than anything else and mainly mine. other people in the team and around all things clojure in emacs may not agree with this last section.

i appreciate i might not answered all your questions but hopefully gave some food for thoughts. let me know what you think and if this in any way helpful or just even more confusing...

Apr 16 '17 09:04 benedekfazekas

@benedekfazekas thanks fot the great response.

Please allow me to not address your points right away, because I realize there are some big gaps in my understanding of the various analyzer toolchains.

I was under the impression we'd have no choice but to use https://github.com/clojure/clojurescript/blob/master/src/main/clojure/cljs/analyzer/api.cljc or maybe https://github.com/clojure/clojurescript/blob/master/src/main/clojure/cljs/analyzer.cljc

I found an old post that does clarify some of the goals of tools.analyzer here and puts thnigs in perspective.

It is not clear to me whether using the cljs analyzer directly would be possible. I guess this is a downside of the data-driven approach right here: I cannot look at the refactor-nrepl codebase and easily determine what analyzer functionality it depends on. There is just one or two calls into the library but the rest of the project depends on specific ast keys.

I think what would go a long way towards determining the right path forward would be to have an exhaustive reference of the ast data that's actually used throughout the project. Then we could look at the outputs of different alternatives and easily determine what's different and/or missing. Am I correct that this would help?

If so, maybe a test namespace where each middleware would have to assert against an analysis pass checking for its dependencies, or maybe better, a set of specs for the ast?

Edit: I'm going off another assumption here, which is that with some undetermined amount of shims, we'd be able to build upon the clojurescript analyzer directly and recreate the ast that refactor-nrepl currently requires. I now realize sounds a lot like rewriting a subset of tools.analyzer.js, and without familiarity with those tools it's unclear how much work this represents.

Apr 16 '17 15:04 julienfantin

I think what would go a long way towards determining the right path forward would be to have an exhaustive reference of the ast data that's actually used throughout the project. Then we could look at the outputs of different alternatives and easily determine what's different and/or missing. Am I correct that this would help?

I don't think so. The AST is the API. That was part of the reason in choosing tools.analyzer to begin with, that we could get a general AST (whether the source was clj, cljs or cljc) and then do work on that AST.

If that's no longer viable, we should look into replicating the capabilities afforded by tools.analyzer. @benedekfazekas listed those above.

I'm not aware of another library that does analysis, like tools.analyzer, but we might be able to get the same capabilities by running the cljs compiler.

This takes us to point 4. in your OP:

Are there any potential problems with having to deal with two distinct analysis results?

No, there isn't. And this is starting to look like the only way forward. The lack of progress is mostly because neither @benedekfazekas or myself write cljs in our day to day job and nobody else has shown any interest in helping us out (until now? :))

Apr 16 '17 16:04 expez

I don't think so. The AST is the API. That was part of the reason in choosing tools.analyzer to begin with, that we could get a general AST (whether the source was clj, cljs or cljc) and then do work on that AST.

Right, the point I was trying to make is: if I run a cljc file through both tools.analyzer.jvm and clojurescript.analyzer I'll get different asts even if the file has no reader conditionals. How different is the big question here, and I haven't checked yet, but I'm guessing substantially different given how much code there is in tools.analyzer.js.

So imagine we were to diff those two asts and make fixing those discrepancies our todo list, we potentially could ignore a bunch of those if we limited ourselves to what refactor-nrep uses, the API would then become a subset of the tools.analyzer AST.

Does that make any sense or am I missing something?

Apr 16 '17 17:04 julienfantin

I think you're right that the ASTs will be nothing alike, and that's fine. Even if it was possible to find common ground in the two ASTs and normalize them into some more generalized thing, I don't think that would be a good idea. We'd have to track changes in both projects to keep our normalization step in sync.

I think a much better approach would be to just add a new code path to e.g. find-symbol which uses clojurescript.analyzer, or something else, to find symbol occurrences.

Apr 16 '17 17:04 expez

If you want to talk in real-time, you can find us both on gitter. I'm around on freenode and I think @benedekfazekas still hangs out on slack (in the clojure channel)

Apr 16 '17 17:04 expez

So now, what about: https://github.com/clojure/jvm.tools.analyzer#usage-clojurescript?

The landscape is even more confusing than I initially though, thinking out loud here in hopes that you guys can correct me if I'm missing something:

tools.analyzer is an umbrella project that was aiming to provide a consistent, platform agnostic ast. What I didn't know is that it seems to re-implement the compilation process of the target. Which actually would explain why I sometimes get weird analysis error messages for code that seems to compile just fine.
tools.analyzer.jvm is the clojure backend refactor-nrepl currently uses.
tools.analyzer.js would have been our best option if it were maintained since it's another backend for tools.analyzer. However it's no wonder it's been abandoned if they had to re-implement the compiler given how fast clojurescript changes.
clojurescript.analyzer is a possible integration target, though it will require multiplying code paths in refactor-nrepl and might be difficult to support across clojurescript version?
jvm.tools.analyzer is yet another analyzer project that we haven't mentioned yet. Despite its name it seems to also support cljs and uses both the regular clojure compiler and clojurescript.analyzer.

At first glance jvm.tools.analyzer seems interesting since it's the only project that could give us a unified api? There are various gotchas listed in the readme, like the analysis causing code to be eval'd, which I think relates to #134 mentioned earlier, and I'm not sure how different its output is from tools.analyzer.jvm.

Apr 16 '17 20:04 julienfantin

Your concerns for clojurescript.analyzer would be true for jvm.tools.analyzer as well, since that's just wrapping clojurescript.analyzer in addition to the jvm clojure compiler.

For that reason, we might as well use jvm.tools.analyzer to access clojurescript.analyzer.

If you want to get involved @julienfantin, I suggest the following route forward:

Write new code to replicate the current clj functionality for cljs, using jvm.tools.analyzer.
Once we're feature complete, investigate what it would take to use the newly written code to replace the old code (differences in the ASTs, caching / other performance concerns)
Throw out tools.analyzer.

I don't think we should care about duplicating code while investigating. I'd rather have some extra stuff, than break the currently working clj features while adding cljs support.

It's also too early to worry about supporting multiple versions of cljs. Tracking the latest major release is miles better than no support at all. Most likely the cljs analyzer isn't moving very fast, even if cljs itself is!

Apr 16 '17 22:04 expez

Ok that sounds like a plan!

I do have a couple of remaining questions and a comment:

I'm not familiar with the project's architecture and design so when you talk about duplicating code, I'm not sure what you have in mind and where this duplication would start. Would it be entirely on the clojure side, say something like modifying an operation's nrepl message handler to branch between calling one, the other, or both analysis processes depending on the file-type? Or did you mean defining separate refactorings all the way to the emacs client side?
Something we haven't discussed yet, but that sounds like a good first step to me, would be to ensure that we can make a jvm analysis pass on cljc files while ignoring all the cljs reader conditionals. This is certainly not something we would want to release since it could cause all sorts of issues post-refactoring (like stale symbols in cljs files and so on), but it'd help make refactor-nrepl useful for people working in cljc codebases and willing to get involved with an in-progress fork/PR. Are there any reasons besides the one I mentioned why we wouldn't want to do that? If not could you expand on what would need to happen for this to work?

Lastly I'll say that I have very limited yak-shaving time over the next few months, so progress on this will be slow.

Apr 16 '17 23:04 julienfantin

Let's say you want to start with adding support for cljs for the refactorings that use find-used-locals. This would enable quite a few refactorings (see the list earlier in the thread). You'd basically branch here which checks if the file is a cljs file, and create two new functions:

One containing the existing clojure code, which we're leaving completely unchanged
A new function which contains all sorts of wonderful and magical new code to make this all work for cljs, using jvm.tools.analyzer.

This sounds like a valuable step to do during research / early implementation. We've been using snapshots to dogfood new features, so as soon as something is in a reasonable state, with some benefit, releasing it to early adopters is cool.

@benedekfazekas is planning to cut a stable release sometime later this month, which makes it easy for people to opt-out of any alpha-quality features that are coming to cljs.

Lastly I'll say that I have very limited yak-shaving time over the next few months, so progress on this will be slow.

Slow is infinitely better than nothing!

Apr 16 '17 23:04 expez

And there's the option of writing a simplified portable parser - something that won't really try to expand macros and so on. On having more of the refactorings depend on just the loaded code instead of analyzing all the source code.

In general, there's also the bigger question of how to better support ClojureScript in CIDER as well. Most of the limitations come from the stagnation in nREPL's development - in theory if first call support for ClojureScript was added there instead of relying on piggieback a lot of things would become simpler. Also in theory - supporting some evaluation backend that nREPL would simplify the ClojureScript situation a lot. But that's also a lot of work that no one really wants/has the time to tackle.

Apr 17 '17 07:04 bbatsov

I like that fact that there is a plan formulating how attack this. and agree with @expez that find-used-locals is a nice candidate for POCs. are you guys sure about jvm.tools.analyzer tho? it seems to be an inactive and out of date project to me.. last commit with some real content is 8 months old... am I missing something?

Apr 18 '17 05:04 benedekfazekas

Guess you can always ping the maintainer about its state.

Apr 18 '17 06:04 bbatsov

also there are several self hosted cljs options now. they might worth a check.

Apr 18 '17 06:04 benedekfazekas

it seems to be an inactive and out of date project to me.. last commit with some real content is 8 months old... am I missing something?

It's just a wrapper around the clojure and cljs compiler, so unless the API of those two change, the wrapper doesn't have to be updated.

Apr 18 '17 07:04 expez

ah right. thanks for clarification @expez

Apr 18 '17 07:04 benedekfazekas

Unfortunately haven't had time to work on this at all, but I did gain some insight on what will need to happen, so I'll leave some notes here, if not only for myself:

Add support for `end-column` and `end-line` in the ClojureScript analyzer

The ClojureScript analyzer currently doesn't include end-line and end-column in the AST's :env, but refactor-nrepl's own analyzer namespace relies on that information, notably for node-at-loc?.

I don't know that there is a way to replicate the existing logic around node-at-loc? without being able to bound an ast node with a buffer position, and even if there were it would probably be non-trivial and prevent reusing analyzer logic between clj and cljs.

There is a simpler solution I think, which consist in writing a simple patch for the Clojurescript analyzer. This patch will need to replicate getting the line data from the environment for end-line and end-column and should take care of updating all the call sites (about a dozen or so) to propagate that information throughout the AST.

I mentioned this on the cljs-dev slack channel and it seems like it would be fairly easy to get this merged.

Use `clojure.tools.reader.reader-types/indexing-push-back-reader`

jvm.tools.analyzer uses a vanilla pusback reader, which I don't think would get us the end-line and end-column metadata that the patch described above would use. Fortunately the analyze-ns function is quite small and easy to change for that purpose.

It is not clear at this point whether there is anything else in that project that we would use and would warrant patching the project rather than rolling our own wrapper for the cljs analyzer.

May 25 '17 03:05 julienfantin

ClojureScript patch submitted: https://dev.clojure.org/jira/browse/CLJS-2051

May 26 '17 13:05 julienfantin

awesome!

May 26 '17 13:05 benedekfazekas

also there are several self hosted cljs options now. they might worth a check.

I have to say that when I was wondering how to do this as well I basically ended up considering the above the easiest approach. In particular, Antonio Monteiro has already ported the ClojureScript analyzer to self-host (node) and this would be a very valuable library to spin off.

I might be the only one here, but I think, and I am really trying hard in orchard, that a better approach would be to have something that works with or without the JVM so that eventually it could be used for static analysis and therefore compete with Cursive in terms of functionality like jump to def and other things without waiting n seconds for a REPL. Purely personal battle I know :smile:

Spinning off a new analyzer library has so many upsides that I am considering doing it myself...maybe merging code from tools.analyzer.js as well. However, who knows when I will have time :clock1:

I have also looked into https://github.com/thunknyc/crntl and I had an old idea of using universal-ctags for static analysis -> https://github.com/thunknyc/crntl/issues/9. All well and good but no POC so far :disappointed:.

By the way this is a great discussion so thanks @julienfantin!

Jan 04 '19 18:01 arichiardi

The conforming work of the ClojureScript analyzer AST to the tools.analyzer spec which shipped as part of ClojureScript 1.10.439 doesn't help here?

https://dev.clojure.org/jira/browse/CLJS-1461 https://github.com/clojure/clojurescript/blob/master/changes.md#110439

Jan 05 '19 23:01 kommen

The conforming work of the ClojureScript analyzer AST to the tools.analyzer spec which shipped as part of ClojureScript 1.10.439 doesn't help here?

Think that patch was news to all of us :) It should be largely plug and play, I think!

Jan 05 '19 23:01 expez

yeah sounds like it. good shout @kommen. we discussed this with @vemv in the #cider room on clj slack.

Jan 06 '19 07:01 benedekfazekas

while playing with this, realised that https://dev.clojure.org/jira/browse/CLJS-2051 actually never got merged

Apr 11 '19 19:04 benedekfazekas

after working around the missing end-line, end-column information the next hurdle seems to be that the stages of macro expansion are not persisted in the cljs AST, there is no raw-forms tag. the code expects this tag to be present to be able to look at the original form in case macro expansion happened.

Apr 12 '19 09:04 benedekfazekas

see POC for find-locals support cljs on cljs-support branch. thoughts?

Apr 13 '19 22:04 benedekfazekas

I like the POC :smile:! Also that piece of functionality could be a very good candidate for the orchard!

Apr 15 '19 16:04 arichiardi

@arichiardi Although maybe we can achieve pretty much the same result using rewrite-clj and just ignoring some of the complexity that goes alongside tools.jvm.analyzer and ClojureScript's own analyzer. Looking at projects like clj-kondo it seems you can go pretty far with rewrite-clj and without having to evaluate any code to build the AST.

Apr 15 '19 16:04 bbatsov

refactor-nrepl refactor-nrepl copied to clipboard

Outline work needed for cljs and cljc support

A bit about options on potential solutions

Add support for end-column and end-line in the ClojureScript analyzer

Use clojure.tools.reader.reader-types/indexing-push-back-reader

refactor-nrepl
refactor-nrepl copied to clipboard

Add support for `end-column` and `end-line` in the ClojureScript analyzer

Use `clojure.tools.reader.reader-types/indexing-push-back-reader`