Untracked `arguments` in function scopes.
Related to this: https://github.com/oxc-project/oxc/issues/3374#issuecomment-2223483123
We do not keep track the arguments object in functions, It causes false positives in linters as of now and it can impose more serious limitations or errors in other crates(transformers and minifier come to mind as we might shadow the function arguments by mistake).
Since you are heavily invested in the scopes right now, @Dunqing What do you think?
Funny, I was wondering how we handle arguments just this morning. Sounds like we don't.
I can see 3 options:
- Add
argumentsbinding to all function scopes eagerly. - Add
argumentsbinding to function scopes lazily, only once a reference to it is found in function body (or params). - Special case
argumentssoIdentifierReferencefor it would have noSymbolId.
Personally, I would tend towards (2) - simple, and avoids perf hit of (1).
NB: arguments is almost like an invisible function param, but there is one oddity in sloppy mode:
function f1(x) {
// SyntaxError: Identifier 'x' has already been declared
let x;
}
function f2(arguments) {
// SyntaxError: Identifier 'arguments' has already been declared
let arguments;
}
function f3() {
// No problem
let arguments;
}
Also, you can have 2 bindings for arguments in a function:
function f(x, y = arguments) {
let arguments = 456;
console.log(y[0]); // Logs 123
console.log(arguments); // Logs 456
}
f(123);
So we'd need an extra scope for function body.
I think the option 3 isn't a really good approach here. I'm torn between 1 and 2.
The latter seems like a good thing for performance but I'm afraid it would cause issues later on because javascript is whimsical at times:
function f() {
eval("console.log(arguments)");
}
f("a", "b", "c")
With the second approach, we still have the same issue of shadowing arguments and generating the wrong code!
He had to go and bring eval() into it... 😄
But even then, I can't actually see how that can go wrong. eval() will (I assume) prevent mangling names of all bindings which it can "see". And we can just ban creating new bindings called arguments in the transformer (that's illegal in strict mode anyway).
https://github.com/eslint/eslint-scope/blob/8082caa1a0f9aae0894a0a01fef9efa7a5e509f6/lib/scope.js#L671
He had to go and bring
eval()into it... 😄
😆
But even then, I can't actually see how that can go wrong.
eval()will (I assume) prevent mangling names of all bindings which it can "see". And we can just ban creating new bindings calledargumentsin the transformer (that's illegal in strict mode anyway).
You are right that we shouldn't ever mangle arguments if it isn't user-defined. I'm just saying since arguments are a context-sensitive identifier and javascript allows for weird ways to access variables through non-conventional ways it might cause issues further down the road.
Maybe no sane person uses new Function or eval in the production code, But javascript frameworks might generate such code and expect us to bundle it correctly for them.
I might be overthinking this.
We should add an argument symbol to the function scope. Both TypeScript and typescript-eslint do the same thing
What do you think of this example?
// Sloppy mode
function f(x, y = arguments) {
let arguments = 456;
console.log(y[0]); // Logs 123
console.log(arguments); // Logs 456
}
f(123);
This demonstrates there can be 2 distinct bindings for arguments in a function. arguments in y = arguments references a different binding from let arguments = 456.
I was reading the eslint-scope tests today when saw an oddity I couldn't figure.
It contains arguments in the global scope. Your comment made me go and look it up, Turns out they always fake arguments in all non-function scopes. Here's my educated guess about how it is done there:
- Step 1: It visits each node and starts adding their refs and IDs to the scope manager. After doing the same for the inner nodes it would then call
closeon thereferencer. - Step 2: Close the scope(s) for the visited node.
- Step 3: Try to resolve the identifier
- Step 4: Do not resolve default parameters.
- Step 5: If not resolved delegate it to the upper scope for resolution.
- Step 6: Continue to hit the global scope.
- Step 7: Resolve implicit variable declarations
- Step 8: Call the vanilla
super.__closeat the end of global scope closing. - Step 9: Everything is valid at this stage so something like
argumentswould be registered as implicit/builtin/global/dangling identifier.
With these steps, your code would work similarly to this:
let arguments = [123];
function f(x, y = arguments) {
let arguments = 456;
console.log(y[0]); // Logs 123
console.log(arguments); // Logs 456
}
f(123);
Treating arguments as external to the function just seems weird. And it can't handle the case where there's another arguments binding in scope above e.g.:
// Sloppy mode
let arguments;
function f(x, y = arguments) {
let arguments = 456;
}
Here we have 3 bindings for arguments, and only 2 scopes to put them in. I know this is an edge case, and will be very rare in practice, but Oxc aims for absolute correctness.
Only solutions I can see are:
- Always add an extra scope for function body.
- Only add an extra scope for function body if there's a reference to
argumentsin function params. - Treat implicit
argumentsas a special case and don't create a binding for it.
(1) is simpler and more conceptually correct, but imposes quite a large penalty. Every function generates 2 scopes, and functions are a common construct, so it will slow down symbol resolution significantly. (3) has been rejected. So I would lean towards (2).
I personally would prefer to find a way to avoid unnecessary scopes, Can't we declare arguments as a hoisted variable on the function scope? with that, we should be able to redeclare it in the function body.
But that's just my preference and I have nothing against your approach here, It just feels wrong to add scopes left and right as a means to simplify stuff. As you said correctness is more important. I'm not a huge fan of premature optimization.
I just have a quick question. We are wrapping the whole function in the scope right now. Is this going to change? Is the scope going to be entered after the parameters?
To clarify, what I'm saying about option (2) above is:
- By default, only create 1 scope for the function (as now).
- Add binding for
argumentsto that scope. - Only if
argumentsis referenced in the function's params, create a further scope for function body. This will be very rare.
i.e. Handle the weird uncommon edge case as a de-opt. No overhead from extra scope in common case, but maintain complete correctness.
To clarify, what I'm saying about option (2) above is:
* By default, only create 1 scope for the function (as now). * Add binding for `arguments` to that scope. * _Only if_ `arguments` is referenced in the function's params, create a further scope for function body. This will be very rare.i.e. Handle the weird uncommon edge case as a de-opt. No overhead from extra scope in common case, but maintain complete correctness.
Actually, it's not as bad as I initially thought. I'm just afraid of something like this.
When you are accessing the Ancestors of a scope you have to check if that scope is a legit or a fake one to prevent issues in places when we want to get the "actual" parent with something like scope.ancestors(xxx).nth(1) or scope.get_parent_id(xxx).
It also means that there are now some really obscure edge cases with arguments that won't get tested, It is not just about parent scope, With this approach, you'd have some identifiers that are pointing to the parent of a function's body but aren't the actual scope of the enclosing node. I feel it would cause a lot of confusion and heisenbugs for both us(especially contributions that don't get reviews by people who are aware of this) and our crate users.
What are your thoughts on this? Even if we do all of these checks doesn't it mean we won't get much performance out of this in the long run(as we start to use scope more and more)?
Edit:
I've checked the spec and even there, They only create a new lexical environment if strict is false to prevent eval from creating a new binding. Otherwise, they would always use the lexical environment of the function's callee. I know these aren't apple to apple comparisons as environments are a runtime concept and aren't 1:1 with scopes. But it might help.
But maybe I'm missing something here.
https://tc39.es/ecma262/multipage/ordinary-and-exotic-objects-behaviours.html#sec-functiondeclarationinstantiation https://tc39.es/ecma262/multipage/ordinary-and-exotic-objects-behaviours.html#sec-arguments-exotic-objects
I found out how TypeScript handles arguments. TypeScript adds an arguments symbol but does not add it in scopes. https://github.com/microsoft/TypeScript/blob/d8086f14b6b97c0df34a0cc2f56d4b5926a0c299/src/compiler/utilities.ts#L11361-L11380
I found out how TypeScript handles
arguments.TypeScriptadds anargumentssymbol but does not add it in scopes. https://github.com/microsoft/TypeScript/blob/d8086f14b6b97c0df34a0cc2f56d4b5926a0c299/src/compiler/utilities.ts#L11361-L11380
Nice find. I also see tsc defines
globals: SymbolTable;
argumentsSymbol: Symbol;
requireSymbol: Symbol;
TypeScriptadds anargumentssymbol but does not add it in scopes.
TS may do it, but it seems like a bad idea to me. Rather confusing, no?
For the record, I added this to the mangler
https://github.com/oxc-project/oxc/blob/0a2f68756f38f8ef003425e856061502b0c97a0a/crates/oxc_mangler/src/lib.rs#L229-L231
arguments can be renamed if it's a local const, let, or function declaration declared in function scope.
// Sloppy mode
function outer() {
let arguments = 123;
console.log(arguments);
}
function outer2() {
function arguments() {}
console.log(arguments);
}
But not if it's a var, or hoisted function declaration:
// Sloppy mode
function outer() {
console.log(arguments); // Logs [1, 2, 3]
var arguments = 456;
console.log(arguments); // Logs 456
}
outer(1, 2, 3);
function outer2() {
console.log(arguments); // Logs [1, 2, 3]
{
function arguments() {}
}
console.log(typeof arguments); // Logs 'function'
}
outer2(1, 2, 3);
Personally I'm in favour of adding scopes for function bodies anyway for other reasons (#5017), so that'd resolve some of the questions above.
Any progress on this? no-undef rule still report an error in my project.
Any progress on this? no-undef rule still report an error in my project.
Please file an issue. This is just an improvement, not a blocker for any rules. cc @camc314
EDIT: Fixed in https://github.com/oxc-project/oxc/pull/13763
Just to note that at some point in next few months, we'll be utilizing Semantic on JS side for linter JS plugins (see https://github.com/oxc-project/oxc/issues/9905#issuecomment-3289389212).
In order to do that, we'll need to make some changes to Scoping to make it compatible with ESLint.
ESLint's API includes scope.isArgumentsMaterialized() which returns whether a function's arguments binding is used.
To support that API, we'll need to address this issue somehow.