humanify Refactor `visitAllIdentifiers` to leverage `babel`'s built in scope tracking better

This might be my own naive misunderstanding of how visitAllIdentifiers works currently, but from a quick skim through it feels kind of convoluted in how it's using findScopes, hasVisited, markVisited, etc; when, if I remember correctly, babel's AST parser + @babel/traverse has built in support for scope tracking/renames/etc that I believe make most of this redundant?

https://github.com/jehna/humanify/blob/main/src/plugins/local-llm-rename/visit-all-identifiers.ts

I haven't tested this specifically, but after a few rounds of iteration with GitHub Copilot, this is feeling pared down/closer to how I would have expected this feature to have been implemented, leveraging more of Babel's scope tracking/etc:

import { parseAsync, transformFromAstAsync, NodePath } from "@babel/core";
import * as babelTraverse from "@babel/traverse";
import { Identifier, toIdentifier } from "@babel/types";

const traverse: typeof babelTraverse.default.default = (
  typeof babelTraverse.default === "function"
    ? babelTraverse.default
    : babelTraverse.default.default
) as any; // eslint-disable-line @typescript-eslint/no-explicit-any -- This hack is because pkgroll fucks up the import somehow

type Visitor = (name: string, scope: string) => Promise<string>;

export async function visitAllIdentifiers(
  code: string,
  visitor: Visitor,
  contextWindowSize: number
) {
  const ast = await parseAsync(code, { sourceType: "unambiguous" });
  if (!ast) {
    throw new Error("Failed to parse code");
  }

  traverse(ast, {
    Identifier: async (path) => {
      const surroundingCode = await scopeToString(path, contextWindowSize);
      const renamed = await visitor(path.node.name, surroundingCode);
      if (renamed !== path.node.name) {
        let safeRenamed = toIdentifier(renamed);
        while (
          path.scope.hasBinding(safeRenamed) ||
          path.scope.getProgramParent().hasBinding(safeRenamed)
        ) {
          safeRenamed = `_${safeRenamed}`;
        }
        path.scope.rename(path.node.name, safeRenamed);
      }
    }
  });

  const stringified = await transformFromAstAsync(ast);
  if (stringified?.code == null) {
    throw new Error("Failed to stringify code");
  }
  return stringified.code;
}

async function scopeToString(
  path: NodePath<Identifier>,
  contextWindowSize: number
) {
  const surroundingPath = closestSurroundingContextPath(path);
  const code = `${surroundingPath}`; // Implements a hidden `.toString()`
  if (code.length < contextWindowSize) {
    return code;
  }
  if (surroundingPath.isProgram()) {
    const start = path.node.start ?? 0;
    const end = path.node.end ?? code.length;
    if (end < contextWindowSize / 2) {
      return code.slice(0, contextWindowSize);
    }
    if (start > code.length - contextWindowSize / 2) {
      return code.slice(-contextWindowSize);
    }

    return code.slice(
      start - contextWindowSize / 2,
      end + contextWindowSize / 2
    );
  } else {
    return code.slice(0, contextWindowSize);
  }
}

function closestSurroundingContextPath(
  path: NodePath<Identifier>
): NodePath<Node> {
  const programOrBindingNode = path.findParent(
    (p) => p.isProgram() || path.node.name in p.getOuterBindingIdentifiers()
  )?.scope.path;
  return programOrBindingNode ?? path.scope.path;
}

That said, I could be missing/misunderstanding some nuance of how it's currently implemented that isn't handled properly in this new version, etc.

Feb 18 '25 09:02 0xdevalias

@babel/traverse has built in support for scope tracking/renames/etc that I believe make most of this redundant?

All the magic happens in smallestScope.scope.rename(smallestScopeNode.name, safeRenamed);, the rest of the code is only used for the LLM to get some context of how the variable is used

Feb 18 '25 15:02 j4k0xb

All the magic happens in smallestScope.scope.rename(smallestScopeNode.name, safeRenamed);

@j4k0xb Hrmm, fair.

Skimming through the logic:

visitAllIdentifiers
- calls babel's parseAsync to get the AST
- creates Set's for renames / visited
- calls findScopes
  - traverse's the AST and for each BindingIdentifier node, calls closestSurroundingContextPath and calculates the pathSize, then stores that in scopes as [nodePath: NodePath<Identifier>, scopeSize: number][]
  - sorts the scopes by their pathSize
  - then .map's them to just their nodePath
- extracts numRenamesExpected from scopes.length
- loops through scopes
  - checks hasVisited
  - gets the surroundingCode with scopeToString
  - calls the passed in visitor function to get the renamed variable
  - does some safeRenamed checking
  - calls smallestScope.scope.rename
  - calls markVisited
  - outputs the onProgress

It's been a while since I was looking at this PoC code of mine, so I might be misremembering it, but in this one I was using Scopable with babel.traverse to output all of the scopes:

https://github.com/0xdevalias/poc-ast-tools/blob/05a34a9e74be6e686a78a49a02b6750ce8796e01/babel_v1_0_scope_debug.js#L68-L72
- ```
babel.traverse(ast, {    
  Scopable(path) {    
    debugLogScopeBindings({ path })    
  }    
});
```
- https://babeljs.io/docs/babel-types
  - https://babeljs.io/docs/babel-types#aliases
    - https://babeljs.io/docs/babel-types#scopable
      - A cover of FunctionParent and BlockParent.
      - https://babeljs.io/docs/babel-types#functionparent
        
        A cover of AST nodes that start an execution context with new VariableEnvironment. In other words, they define the scope of var declarations. FunctionParent did not include Program since Babel 7.
      - https://babeljs.io/docs/babel-types#blockparent
        
        A cover of AST nodes that start an execution context with new LexicalEnvironment. In other words, they define the scope of let and const declarations.

And there was also:

path.scope.bindings
path.scope.getAllBindings()

And the scopes seemed to have uid's and types as well:

binding.scope.uid
binding.scope.path.type

And path.scope.dump() was useful for debugging too:

https://github.com/0xdevalias/poc-ast-tools/blob/05a34a9e74be6e686a78a49a02b6750ce8796e01/babel_v1_3.js#L146-L180

I can't remember off hand if there was a way to just directly access all of the scopes/bindings from the original parsed AST without even needing to traverse it.

I'd need to dig deeper into it to be certain, but my immediate gut feel here is that the current code here in humanify feels closer to my original explorations in my own code, where I was doing a lot of reinventing the wheel to try and process scope/rename things that babel already handled for me.

Like here is an earlier iteration where I was manually ensuring I only processed binding renames once:

https://github.com/0xdevalias/poc-ast-tools/blob/05a34a9e74be6e686a78a49a02b6750ce8796e01/babel_v1_1.js#L55-L111

And yet in later versions I switched away from that because there was no need to:

https://github.com/0xdevalias/poc-ast-tools/blob/05a34a9e74be6e686a78a49a02b6750ce8796e01/babel_v1_3_clean.js#L353-L400

Like if you take all of the debug logging out of that, I think the main implementation ends up being:

babel.traverse(ast, {
  Scopable(path) {
    Object.entries(path.scope.bindings).forEach(([name, binding]) => {
      const newName = generateNewName(path, binding, name);
      if (name !== newName) {
        path.scope.rename(name, newName);
      }
    });
  },
});

(Then obviously add back in the 'safe rename' checks using path.scope.hasBinding(newName)/similar)

See also:

https://github.com/jamiebuilds/babel-handbook/blob/master/translations/en/plugin-handbook.md#scopes
https://github.com/jamiebuilds/babel-handbook/blob/master/translations/en/plugin-handbook.md#toc-scope
- https://github.com/jamiebuilds/babel-handbook/blob/master/translations/en/plugin-handbook.md#generating-a-uid
  - path.scope.generateUidIdentifier
  - path.scope.generateUidIdentifierBasedOnNode
  - path.scope.rename
    - path.scope.rename("n", "x")
    - Alternatively, you can rename a binding to a generated unique identifier
      - path.scope.rename("n")

Feb 21 '25 05:02 0xdevalias

Some tangentially related notes I wrote on email recently:

it is very slow due to the babel traverse

it takes ~90s on my machine locally with phi-4-mini-iq3-xxs when i run it through humanify (which uses @babel/traverse), as opposed to just running it through direct chat locally (prompt: "hey make this readable") takes like 5s. so i know @babel/traverse and the visitAllIdentifiers fn is slowing things down

So I’m not certain about this off the top of my head; but I suspect that it might not be babel/traverse or the AST parsing stuff that makes things slow; but more that I believe the calls to the LLM are happening within that AST traversal, and chunks of the code are uploaded individually/sequentially; and blocking till that LLM call returns. I believe there’s another issue on the humanify repo (see https://github.com/jehna/humanify/issues/167) that talks a bit about parallelising things.

So I would first see if just stubbing out the actual LLM call part of that gives you the speedups you were hoping to see as a first test. If it’s still too slow even without that part; then my next guess would be that the code within that traverse that tries to decide what/how much context to send to the LLM may not be as efficient as it could be. I wasn’t necessarily thinking about speed when I opened the issue related to cleaning up that bit of the code; but just more that it seemed more complex than my gut feel said it needed to be; and so I found it harder to reason about than I should have. So it wouldn’t surprise me if there might be a performance degradation within that bit.

When you say it takes like 90s with traverse compared to like 5s directly in chat; how big is the code sample you’re providing? Is it like a real world web app, or a tiny snippet? I believe one of the reasons the code is parsed and traversed as it currently is is so that we can control how much context is given to the LLM at any one time; and not confuse/dilute it/overflow its context window/etc.

Another random guess as to something that might potentially impact the speed of things running through the traverse code vs standalone chat is if the humanify code isn’t running the model on the GPU for some reason, but is falling back to CPU/similar.

It’s sort of hard to guess/reason about things beyond that at this stage. Sort of feels like you’d need to try some of the things to narrow stuff down more/get more insight into exactly what part of it is making it slow/etc.

May 02 '25 00:05 0xdevalias

Any thought given to using the TypeScript Compiler API/LSP?

courtesy of ChatGPT

import * as vscode from 'vscode';
import ts from 'typescript';
import * as fs from 'fs';

export function activate(context: vscode.ExtensionContext) {
  const disposable = vscode.commands.registerCommand('tsRenameAllIdentifiers.renameAll', async () => {
    const editor = vscode.window.activeTextEditor;
    if (!editor) {
      vscode.window.showErrorMessage('No active editor.');
      return;
    }

    const document = editor.document;
    if (document.languageId !== 'typescript' && document.languageId !== 'typescriptreact') {
      vscode.window.showErrorMessage('Not a TypeScript file.');
      return;
    }

    const sourceCode = document.getText();
    const fileName = document.uri.fsPath;

    // Parse the file with TypeScript
    const sourceFile = ts.createSourceFile(fileName, sourceCode, ts.ScriptTarget.Latest, true);

    const positionsToRename: { name: string, offset: number }[] = [];
    const seen = new Set<string>();

    // Visit nodes recursively
    function visit(node: ts.Node) {
      if (ts.isIdentifier(node)) {
        const name = node.text;
        if (seen.has(name)) return;
        seen.add(name);
        positionsToRename.push({ name, offset: node.getStart() });
      }
      ts.forEachChild(node, visit);
    }

    visit(sourceFile);

    const editsToApply: vscode.WorkspaceEdit[] = [];

    for (const { name, offset } of positionsToRename) {
      const pos = document.positionAt(offset);
      const newName = `renamed_${name}`;

      try {
        const edit = await vscode.commands.executeCommand<vscode.WorkspaceEdit>(
          'vscode.executeDocumentRenameProvider',
          document.uri,
          pos,
          newName
        );

        if (edit) {
          editsToApply.push(edit);
        }
      } catch (err) {
        console.warn(`Skipping ${name}:`, err);
      }
    }

    // Apply all collected edits sequentially
    for (const edit of editsToApply) {
      await vscode.workspace.applyEdit(edit);
    }

    vscode.window.showInformationMessage(`Renamed ${positionsToRename.length} identifiers.`);
  });

  context.subscriptions.push(disposable);
}

export function deactivate() {}

Jun 02 '25 02:06 brianjenkins94

Any thought given to using the TypeScript Compiler API/LSP?

@brianjenkins94 For what specifically?

Resurrecting the link from your edit history to try and get more context:

https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocument_rename
- Rename Request
- The rename request is sent from the client to the server to ask the server to compute a workspace change so that the client can perform a workspace-wide rename of a symbol.

Babel's AST parsing already provides support for renaming identifiers, which is what humanify is currently using; so I'm not sure what benefits/differences you're thinking would be possible with this?

courtesy of ChatGPT

@brianjenkins94 While I appreciate the intent behind this, based on a quick skim through it, I don't believe it does anything new/different/better than what we're already able to (and doing) directly with Babel's own AST parsing + identifier renaming.

You may find this breakdown / explanation / Babel based example helpful to understand this deeper:

https://chatgpt.com/share/683d14c2-1f54-8008-a4a9-fec8f6717f7a

Jun 02 '25 03:06 0xdevalias

The current implementation doesn't rename class or static fields so I thought leveraging the TypeScript API could yield better, more contextually-aware results.

Jun 02 '25 03:06 brianjenkins94

The current implementation doesn't rename class or static fields

@brianjenkins94 Ah, ok. This would seem to be the more relevant issue for that, so your comment on this one was somewhat disconnected from that context:

https://github.com/jehna/humanify/issues/183

While you may have already seen it, my original exploration into the AST was here:

https://github.com/jehna/humanify/issues/183#issuecomment-2665020171

And this comment seemed to suggest that Babel's default scope tracking doesn't cover private class fields/methods, but linked to one potential solution using Babel's own babel-helper-create-class-features-plugin's privateNameVisitorFactory:

https://github.com/jehna/humanify/issues/183#issuecomment-2666094327

I haven't deeply explored that option, nor whether TypeScript's LSP/etc would do better in this case.

But in any case, if that were to be explored deeper, the other issue would be a more appropriate place for that to be done/discussed than here.

Edit: I have copied across the relevant parts of this discussion to that issue:

https://github.com/jehna/humanify/issues/183#issuecomment-2928555080

Jun 02 '25 03:06 0xdevalias