UglifyJS icon indicating copy to clipboard operation
UglifyJS copied to clipboard

Feature suggestion: optimizing `var` definitions

Open chrisaljoudi opened this issue 12 years ago • 24 comments

Hi! I came across a case where there's a possible trick that can reduce size significantly. Take the following piece of code:

var x = 12345678, y = 12345678;

It compresses to:

var x=12345678,y=12345678;

Can't it be optimized to the following:

var x=12345678,y=x;

Or is there something I'm missing? I think it should be available as an unsafe optimization.

Thanks!

chrisaljoudi avatar Dec 22 '12 16:12 chrisaljoudi

Also, if you could determine that x was read only and never used in scope after y had been mutated, or vice versa (or even that both were read only) you could potentially optimise one of the two variables away completely.

Have to be careful of inner closures, probably pretty unsafe, could be fun though! :stuck_out_tongue:

mal avatar Dec 22 '12 20:12 mal

The optimisation above would be completely safe though

ForbesLindesay avatar Dec 22 '12 20:12 ForbesLindesay

The optimization is only safe when the vars contain primitives, isn't it?

staabm avatar Dec 23 '12 12:12 staabm

@staabm I think it's safe when the expression-node being assigned to the variables has .has_side_effects() false.

chrisaljoudi avatar Dec 23 '12 13:12 chrisaljoudi

In general it is safe to optimise:

var x = E, y = E;

to

var x = E, y = x;

For all E provided that E:

  1. has no side effects
  2. returns the same value every time

e.g. The following would not be safe to optimise even though Math.random() has no side effects:

var x =  Math.random(), y =  Math.random();

The following would be though:

function get() { return 5; }
var x =  get(), y =  get();

The follwoing would not because {} === {} is false

function get() { return {}; }
var x =  get(), y =  get();

but the following would be:

var val = {};
var x = val, y = val;

ForbesLindesay avatar Dec 23 '12 13:12 ForbesLindesay

@ForbesLindesay You're definitely right, sorry for being inaccurate.

chrisaljoudi avatar Dec 23 '12 13:12 chrisaljoudi

Interestingly it would be safe with certain very specific side effects, but this is probably more in depth analysis than we'd want to do:

var got = false;
function get() {
  return got = true;
}
var x =  get(), y =  get();

because although get has side effects, it doesn't have a side effect if you run it a second time having not reset got to false.

ForbesLindesay avatar Dec 23 '12 13:12 ForbesLindesay

No worries @abody these things are tricky at the best of times, my initial intuition which I started writing down was completely false.

ForbesLindesay avatar Dec 23 '12 13:12 ForbesLindesay

@ForbesLindesay Yeah that's too deep an analysis -- you'd expect the person who's coding the above would optimize it himself/herself. I initially just thought about this optimization because I had initialized lots of variables to null.

chrisaljoudi avatar Dec 23 '12 13:12 chrisaljoudi

@ForbesLindesay There are also cases where defined getters might have side effects, but it's too hard to spot those. That's why I suggested to just make the optimization disabled by default/unsafe.

chrisaljoudi avatar Dec 23 '12 13:12 chrisaljoudi

You wouldn't want it to run for anything that could be a getter, because that's way too likely to be unsafe. Just code the optimisation so it only does it when we know it's safe (i.e. expressions that don't contain property accessors, function calls, object literals, array literals or regexp literals).

Perhaps more significantly, it slows runtime performance if not done carefully: http://jsperf.com/optimizing-var-definitions (needs more runs to be statistically significant).

I suspect it will have negligible or no affect on file size once gzip has done it's thing. It may ultimately not be worth optimising, unless you want to handle the case of a call to a function which has no side affects and always returns the same value. e.g. the performance gain of optimising the following would be huge:

function factorial(n) {
  if (n === 0) return 1;
  else if (n > 0) return n * factorial(n - 1);
}
var a = factorial(50), b = factorial(50);

How often such code actually appears though I'm not sure.

Do we know whether UglifyJS already tracks which functions are 'pure' functions (i.e. no side effects and always return the exact same value for the same input)? If it does, this may be doable, if not then it's a fair bit of work for minimal gain.

ForbesLindesay avatar Dec 23 '12 13:12 ForbesLindesay

@ForbesLindesay Oh interesting benchmark; I'm running Chrome 23 and actually the separate declaration is slower than either the remaining two. You can never really know if there's a getter defined; and it needn't be a property accessor to be unsafe (think about the case where you do window.__defineGetter__("something", someFunctionThatDoesEvilStuff)).

Perhaps you're right about the negligible difference in file size. I'm not sure whether UglifyJS keeps track of such 'pure' functions, perhaps @mishoo can clarify?

chrisaljoudi avatar Dec 23 '12 13:12 chrisaljoudi

OK, anything that can be a getter then, which should include all property accessors and any variable that's not found in the current scope (and therefore could be a property on the window)

ForbesLindesay avatar Dec 23 '12 13:12 ForbesLindesay

this.defineGetter wouldn't make any difference, I've already said we should exempt property accessors and 'global' variables, this.name is a property accessor.

ForbesLindesay avatar Dec 23 '12 13:12 ForbesLindesay

It might need to also be disabled inside with blocks or where there's evals and new Functions, but virtually all optimisations are disabled inside with blocks etc.

ForbesLindesay avatar Dec 23 '12 13:12 ForbesLindesay

@ForbesLindesay Oh. Never mind about my previous comment. I completely missed the point. Sorry!

chrisaljoudi avatar Dec 23 '12 13:12 chrisaljoudi

For anyone else reading:

function test() {
    this.__defineGetter__("x", function() {
        alert("This is evil.");
        return 5;
    });
    return x;
}
console.log(new test());

just throws a reference error and if you're in stict mode then so does the above code. The above code in non-strict mode is just assigning x as a property of the global object so the following is also fine:

(function () {
  var x = 10;
  function test() {
    this.__defineGetter__("x", function() {
        alert("This is evil.");
        return 5;
    });
    return x;
  }
  console.log(test()); //logs 10
}())

ForbesLindesay avatar Dec 23 '12 14:12 ForbesLindesay

This might seem simple enough to wonder “is there a good reason not to do it?”, but it's in fact quite complicated in the general case.

Doing it only for the case var i = XXX, j = XXX is fairly simple, indeed, but won't probably save one byte in 99.9% of cases. Doing it for the case var i = XXX; ...; var j = XXX requires data flow analysis, and there's some serious theory behind this term. I investigated the possibilities but for the near future I don't think we'll have this (it's complicated by two issues: (1) closures, and (2) JS is a complex beast; normally compilers do DFA on a very simple intermediate language, not on the original source).

This comment also applies to mishoo/UglifyJS2#68 and mishoo/UglifyJS2#27 — data flow analysis would help dealing with them too.

mishoo avatar Dec 23 '12 16:12 mishoo

@mishoo I see. Well, I just tested with the Google Closure Compiler (it actually took me some time to work around it optimizing away the variables and replacing them with their values). It appears it doesn't do such an optimization, either. Thanks for your explanation!

chrisaljoudi avatar Dec 23 '12 16:12 chrisaljoudi

This could be easily doable by using register-based bytecode, instead of AST, as IR. You'd lose comments but you could always add a bytecode for comments if you wanted to :P

SoniEx2 avatar Sep 07 '15 00:09 SoniEx2

@SoniEx2 Unless you're very careful with your design, transformations on that bytecode would result in behaviour that cannot be translated back to JavaScript.

michaelficarra avatar Sep 07 '15 16:09 michaelficarra

@michaelficarra You can turn Lua bytecode back and forth, at least... altho slightly inefficiently for some things... Maybe optimize based on bytecode and minify based on JS? (JS -> bytecode -> optimize -> JS -> minify)

SoniEx2 avatar Sep 07 '15 16:09 SoniEx2

Sure, it can be done, but that optimisation step is going to be tricky. You have to make sure the bytecode's overall structure remains representable in JavaScript, or at least can be filled in to be representable. As an example, a bytecode with arbitrary jumps can jump into the middle of a "loop", but that behaviour cannot be represented in JavaScript; at least, not easily.

michaelficarra avatar Sep 07 '15 16:09 michaelficarra

I see what you mean. But if you want to optimize efficiently an IR is usually the way to go.

SoniEx2 avatar Sep 07 '15 17:09 SoniEx2