istf-spec icon indicating copy to clipboard operation
istf-spec copied to clipboard

Optimization: a Stack/Heap approach

Open thysultan opened this issue 7 years ago • 50 comments

Instead of

body {
  color: red
}

producing

[
  [RULE_START, 1],
    [SELECTOR, 'body'],
    [PROPERTY, 'color'],
    [VALUE, 'red']
  [RULE_END]
]

It could instead produce

HEAP = ['body', 'color', 'red']

[
RULE_START, 0,
SELECTOR, 0,
PROPERTY, 1,
VALUE, 2,
RULE_END, 2
]

Where the ints 0,1,2 are memory addresses for the corresponding values in the HEAP.

This way you can use a flat Uint32Array typed array for maximum throughput on performance.

thysultan avatar Jun 21 '17 21:06 thysultan

Ah I got what you mean - using separate arrays for markers and props/values/selectors!

kof avatar Jun 21 '17 21:06 kof

Thats an interesting idea, we should add a benchmark for this as well.

kof avatar Jun 21 '17 21:06 kof

We could in theory end up with 3 arrays. Function refs, strings and markers. Each of them would be of one consistent type and markers even integers.

kof avatar Jun 21 '17 21:06 kof

It could also mean we could reuse selectors, properties and property values. i.e

body {
  color: red;
}

h1 {
color: red;
}

Would produce

HEAP = ['body', 'color', 'red', 'h1']

[
RULE_START, 0,
SELECTOR, 0,
PROPERTY, 1,
VALUE, 2,
RULE_END, 2,

RULE_START, 3,
SELECTOR, 3,
PROPERTY, 1,
VALUE, 2,
RULE_END, 3
]

thysultan avatar Jun 21 '17 21:06 thysultan

Oh so the HEAP array would become a collection of unique strings, neat! Not sure though how much it will bring after gzip, but def. worth considering!

kof avatar Jun 21 '17 21:06 kof

Yes gzip might make the KB benefit a hard sell but It should improve the memory that the runtime JS VM has to allocate and code it has to parse.

thysultan avatar Jun 21 '17 21:06 thysultan

Yeah, if it won't slow down parsing, definitely worth optimizing!

kof avatar Jun 21 '17 21:06 kof

WIP of a compiler that produces this from css strings. https://rawgit.com/thysultan/stylis.js/master/plugins/bytecode/tests/index.html

thysultan avatar Jun 21 '17 23:06 thysultan

How is it related to this issue?

kof avatar Jun 22 '17 00:06 kof

Yes, this is not for the bench, just a proof of concept of a compiler that generates this format from a string source, related issue #16

thysultan avatar Jun 22 '17 00:06 thysultan

Ok I have added this approach to the bench (see fromTwoTyptedArrays fn) https://esbench.com/bench/592d599e99634800a03483d8

I din't use Uint32Array because it slows it down a lot.

It is not too fast. It is possible that with a much larger test data it will show better relative performance, but mostly I think its a memory optimization, because we add more lookups when using this approach.

The "From Array of strings" test performs best so far.

kof avatar Jun 22 '17 10:06 kof

For the tests to be fair you'd need to exclude the creation of the arrays from the bench and put that into the setup phase otherwise you're also measuring the creating of the data structure, which in this case is meant to be a preprocess operation instead of a runtime operation.

thysultan avatar Jun 22 '17 15:06 thysultan

I do it for other tests as well, so its comparable

kof avatar Jun 22 '17 15:06 kof

Since the different methods use very different data-structures that are created on each run it would not be comparable since the creating of the data-structures is meant to be a compile time operation, this would include fromArrayOfStrings as well.

thysultan avatar Jun 22 '17 15:06 thysultan

You're also measuring the cost of creating the function on every operation i.e return (input) => { this could throw out any possibility that the VM could generate optimized code which means we might be just measuring a baseline(unoptimized) version that the VM could produce.

thysultan avatar Jun 22 '17 15:06 thysultan

For example this bench https://rawgit.com/thysultan/stylis.js/master/plugins/bytecode/tests/index.html, *see the console.

It avoids all the points i mentioned above, showing that this bytecode format with a typedArray is faster.

1873 ops, from typedArray
87 ops, from stylis
1638 ops, fromSingleArrayOfStrings

thysultan avatar Jun 22 '17 15:06 thysultan

You're also measuring the cost of creating the function on every operation i.e return (input) => { this could throws out any possibility that the VM could generate optimized code which means we might be just measuring a baseline(unoptimized) version that the VM could produce.

Can you elaborate on this specific case?

kof avatar Jun 22 '17 17:06 kof

For example this bench https://rawgit.com/thysultan/stylis.js/master/plugins/bytecode/tests/index.html, *see the console.

on my machine in chrome is "fromSingleArrayOfStrings" the fastest

kof avatar Jun 22 '17 17:06 kof

Can you elaborate on this specific case?

The operation is calling a function that returns another function, then useIt calls this function.

So, you're also measuring the cost of creating the function that converts the data-structure to a string which varies across implementations, this paired with the fact that it's being generated on every operations means that the function will also never get labeled as hot by the VM to produce optimized code since every run is both executing and creating a new function.

On my machine is "fromSingleArrayOfStrings" the fastest.

What where the results?

thysultan avatar Jun 22 '17 17:06 thysultan

The operation is calling a function that returns another function

Did you see that the first function is self invoking? I scoped the static internals inside of the closure. At bench time there is only one fn call.

// then useIt calls this function.

I am not sure it does what it should, but my idea was to avoid smart optimizations which might end up with doing nothing. I am still not sure if it does the job nor if useIt function was actually necessarily in this case, but in any case all tests use it so they are fair.

So, you're also measuring the cost of creating the function that converts the data-structure to a string which varies across implementations

I don't think so.

What where the results?

Just refreshed a couple of times, the results vary between 2500 and 3500, this time the "bytecode" variant was faster multiple times. I am not sure the way you measure is correct. Did you try benchmark.js? I guess they have bullet proof logic for measuring.

kof avatar Jun 22 '17 17:06 kof

Just noticed you are using "while" in one case and "for" in another, can you try making both implementations as similar as possible?

kof avatar Jun 22 '17 17:06 kof

Regarding not measuring the data generation time. I realize that we would have better parsing results when we not generate the data for each test, but this would be less realistic, because in the practice we would generate the data first, then parse, both steps are involved in the real world scenario, right?

kof avatar Jun 22 '17 17:06 kof

Regarding warming up and getting optimized hot code. I am not sure this is the right approach here. I am not sure that function will be called that often in a real world use case so that it becomes hot. Its not a webserver working same operations all the time. It will parse how many?, lets say 100 style sheets on average and its done. Also if done right, it will never parse all of them at once, but rather only those which are required for the current UI state.

For more realistic statistics we could count amount of css rules on popular websites.

kof avatar Jun 22 '17 17:06 kof

Did you see that the first function is self invoking?

I missed that :P

Just refreshed a couple of times, the results vary between 2500 and 3500

Same here.

I am not sure the way you measure is correct.

What part is incorrect?

Regarding not measuring the data generation time? I realize that we would have better parsing results when we not generate the data for each test, but this would be less realistic, because in the praxis we would generate the data first, then parse, both steps are involved in the real world scenario, right?

No, you would not do this on every operation at runtime, the data-structure would be created once at runtime. For example

// data-structure
class Main extends React.Component {}

// X number of operations
ReactDOM.render(<Main>, document.body)
ReactDOM.render(<Main>, document.body)

Just noticed you are using "while" in one case and "for" in another, can you try making both implementations as similar as possible?

Sure, i copied you're implementation of fromSingleArrayOfStrings in the ESBench to compare with that.

thysultan avatar Jun 22 '17 17:06 thysultan

What part is incorrect?

It seems correct, but very simplistic, I think jsperf did a lot of work to get better/more stable results.

No, you would not do this on every operation at runtime, the data-structure would be created once at runtime.

Yes but it would be also parsed just once I assume. After that its either something static or some models.

kof avatar Jun 22 '17 17:06 kof

In the bench it's being created on every operation, since the bench runs this operation > 10,000 times it affects the results.

thysultan avatar Jun 22 '17 18:06 thysultan

In the bench it's being created on every operation, since the bench runs this operation > 10,000 times it affects the results

You mean data generation for every test? Yes thats what I was trying to reproduce. I assume a test case includes data generation + parsing in the real world.

kof avatar Jun 22 '17 18:06 kof

We can make for sure separate tests with data generation and without. Just to be sure that data generation doesn't eat up most of the performance budget. But at the end, important is a test which has both, data structure and parsing. Basically if data generation costs too much time but parsing is faster it doesn't matter, important is whats faster when both is done.

kof avatar Jun 22 '17 18:06 kof

What might be really important is to do tests with more realistic data, meaning instead of 1 rule, using an average rule amount from some popular sites. Larger data sets might completely change the benchmark results.

kof avatar Jun 22 '17 18:06 kof

In the real world the data-structure should probably be created just once for the lifetime of the page, the toString implementation on the other hand could get executed more than once.

What might be really important is to do tests with more realistic data, meaning instead of 1 rule, using an average rule amount from some popular sites.

I agree.

thysultan avatar Jun 22 '17 18:06 thysultan