oxc icon indicating copy to clipboard operation
oxc copied to clipboard

Add separate scopes for function bodies

Open overlookmotel opened this issue 1 year ago • 6 comments

How it is now

Currently functions only have a single scope which contains bindings for:

  • Function name (for function expressions)
  • Parameters
  • Declarations in body of function

In SemanticBuilder we manage to correctly resolve IdentifierReferences within function params with an extra call to resolve_references_for_current_scope after walking the function params:

https://github.com/oxc-project/oxc/blob/b4407c4e9ae4fd35637c74452eb83997f9494875/crates/oxc_semantic/src/builder.rs#L1955-L1963

i.e. We resolve references in params before any bindings from body of function are added to the scope.

The problem

In the transformer, we also need to resolve references, but we don't have the advantage that we do in SemanticBuilder of constructing ScopeTree in order. It already exists, and we have to make sense of it.

Here's an example which doesn't work at present:

import React from 'react';
function foo(elem = <Component />) {
    let React;
}

In classic mode, JSX transform converts this to:

import React from 'react';
function foo(elem = React.createElement(Component, null)) {
    let React;
}

We need to resolve the binding for the inserted reference React. Currently, it resolves incorrectly to let React inside the function body, as resolution goes up the scope tree and finds a binding for React in the function's scope.

Additional problem

In sloppy mode code, there's also an edge case where you can have 2 bindings for arguments in a function - #4232.

Obvious solution

The obvious solution is to add an additional scope to all function bodies.

Advantage: This is a simple solution which has no "surprises".

Disadvantages:

  • More scopes = likely perf hit when resolving references for which the binding is outside current function.
  • Makes it more complicated to detect illegal redeclarations (function foo(x) { let x; }).

Alternative solution

  • Add SymbolFlags::Parameter (or maybe existing ScopeFlags::FunctionScopedVariable will do).
  • Track in Traverse if you're inside a function's parameters or not.
  • When resolving a reference, if inside a function's parameters, for the first scope encountered with ScopeFlags::Function flag, do not match bindings unless they have a SymbolFlags::Parameter flag.

Actually, it's a bit more complex than that, in order to handle cases like this:

const x = 1;
function foo(
    y = function bar(z = x) {
        const x = 3;
    }
) {
  const x = 2;
}

The x in z = x needs to ignore both the binding in bar and the binding in foo and resolve to top level const x = 1. So rather than an "in function params" yes/no flag, would need a "function parameter depth" counter (in this case depth is 2).

Advantage: Maintain lower number of scopes = probably better perf.

Disadvantage: The complexity of reference resolution could make it error-prone.

Which?

The more complicated solution should work (though it wouldn't solve the 2 x arguments bindings edge case). But it's somewhat complex.

Personally I would side with trying the simple solution first and measure how much it impacts performance. Perhaps it's negligible. If so, great, but if not then we can look at the more complicated solution.

Any thoughts?

overlookmotel avatar Aug 20 '24 16:08 overlookmotel

@Boshen @Dunqing would appreciate your input on this when you have time.

overlookmotel avatar Aug 22 '24 12:08 overlookmotel

Actually the complex version would need to be more complicated than I thought, to deal with:

const x = 11, y = 12;
function foo(
    y = function bar() {
        const x = 31;
        const xx = x, yy = y; // 31, 12
    }
) {
  const x = 21, y = 22;
}

When resolving x and y in xx = x, yy = y, you have to:

  • Include bindings in function body of bar (to correctly resolve x).
  • Exclude bindings in function body of foo (to correctly resolve y).

So it'd need something more complicated than a "function parameter depth" counter. It'd need a stack of some kind to record an "am I in params?" flag for each function you're currently in. Still doable, but a bit complicated.

overlookmotel avatar Aug 22 '24 12:08 overlookmotel

You are going to hate me for showing you the following foreign language:

In https://tc39.es/ecma262/#sec-functiondeclarationinstantiation:

  1. Let formals be func.[[FormalParameters]].

  2. Let hasParameterExpressions be ContainsExpression of formals.

  3. If strict is true or hasParameterExpressions is false, then

  4. Else, a. NOTE: A separate Environment Record is needed to ensure that bindings created by direct eval calls in the formal parameter list are outside the environment where parameters are declared. b. Let calleeEnv be the LexicalEnvironment of calleeContext. c. Let env be NewDeclarativeEnvironment(calleeEnv). d. Assert: The VariableEnvironment of calleeContext and calleeEnv are the same Environment Record. e. Set the LexicalEnvironment of calleeContext to env.

tldr we need a new scope for binding initializers, this scope points to the parent scope.

Boshen avatar Aug 24 '24 02:08 Boshen

What is this mumbo jumbo?? 😭 OK, I'll try to digest this and will come back when I've (hopefully) understood it.

overlookmotel avatar Aug 24 '24 08:08 overlookmotel

Another case where you can have 2 bindings with same name in a function (and this one is possible in strict mode too):

'use strict';
const f = function foo(foo2 = foo) {
    let foo = 123;
    return [foo, typeof foo2];
};
console.log(f()); // Logs [ 123, 'function' ]

I am now in favour of separate scopes for function bodies. I believe:

  1. It's the only way to be correct in edge cases like the above.
  2. (more importantly) Without them, the problem of resolving references in transformer, while possible, is pretty complicated, and that complexity will bring with it a performance hit.

overlookmotel avatar Sep 23 '24 13:09 overlookmotel

Another problem we have at present:

export type T = number;
function foo(n: number): T {
  type T = string;
  return n;
}

T in return type needs to resolve to top-level export type T, not type T inside function body.

Whereas here it needs to resolve to type param (which is bound in function's scope):

function foo<T>(n: T): T {
  return n;
}

i.e. return type is resolved within scope of the function but not within scope of the function body.

overlookmotel avatar Oct 09 '24 15:10 overlookmotel

Close as stale.

Boshen avatar Aug 13 '25 04:08 Boshen

I still think this is something we should consider in future, so transferred to backlog: https://github.com/oxc-project/backlog/issues/176

overlookmotel avatar Aug 13 '25 12:08 overlookmotel